You are on page 1of 292

Introduction to Computing at SIO:

Notes for Fall class, 2012


Peter Shearer
Scripps Institution of Oceanography
University of California, San Diego
November 30, 2012

ii

Contents
1 Introduction
1.1 Scientific Computing at SIO . . . . . . .
1.1.1 Hardware . . . . . . . . . . . . .
1.1.2 Software . . . . . . . . . . . . . .
1.2 Why you should learn a real language
1.3 Learning to program . . . . . . . . . . .
1.4 FORTRAN vs. C . . . . . . . . . . . . .
1.5 Python . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

1
2
2
3
4
4
5
6

2 UNIX introduction
2.1 Getting started . . . . . . . . . . . . . .
2.2 Basic commands . . . . . . . . . . . . .
2.3 Files and editing . . . . . . . . . . . . .
2.4 Basic commands, continued . . . . . . .
2.4.1 Wildcards . . . . . . . . . . . . .
2.4.2 The .login and .cshrc files . . . .
2.5 Scripts . . . . . . . . . . . . . . . . . . .
2.6 File transfer and compression . . . . . .
2.6.1 FTP command . . . . . . . . . .
2.6.2 File compression . . . . . . . . .
2.6.3 Using the tar command . . . . .
2.6.4 Remote logins and job control . .
2.7 Miscellaneous commands . . . . . . . . .
2.7.1 Common sense . . . . . . . . . .
2.8 Advanced UNIX . . . . . . . . . . . . .
2.8.1 Some sed and awk examples . . .
2.9 Example of UNIX script to process data
2.10 Common UNIX command summary . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

9
9
11
14
15
17
17
19
21
22
23
24
25
26
29
29
31
31
34

3 Generic Mapping Tools


3.1 Installation of Fink, GMT other useful tools . . . . . . . . . . . . . .

37
37

4 Fortran
4.1 Fortran history . . . . . . . . . . . . .
4.2 Texts and manuals . . . . . . . . . . .
4.3 Compiling and running F90 programs
4.3.1 The first program explained . .
4.3.2 How to multiply two integers .
4.4 An important historical digression . .

51
52
52
53
54
55
56

iii

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

iv

CONTENTS
4.5

4.6
4.7
4.8
4.9
4.10
4.11
4.12

4.13
4.14
4.15
4.16
4.17
4.18
4.19
4.20
4.21
4.22
4.23
4.24

4.4.1 Alternate coding options . . . . . . . . . . . . . . . . . .


Making a formatted trig table using a do loop . . . . . . . . . .
4.5.1 Fortran mathematical functions . . . . . . . . . . . . . .
4.5.2 Possible integer vs. real problems . . . . . . . . . . . . .
4.5.3 More about formats . . . . . . . . . . . . . . . . . . . .
Input using the keyboard . . . . . . . . . . . . . . . . . . . . .
If statements . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.7.1 If, then, else constructs . . . . . . . . . . . . . . . . . .
Greatest common factor example . . . . . . . . . . . . . . . . .
User defined functions and subroutines . . . . . . . . . . . . . .
4.9.1 Subroutines . . . . . . . . . . . . . . . . . . . . . . . . .
4.9.2 Linking to subroutines during compilation . . . . . . . .
Internal procedures . . . . . . . . . . . . . . . . . . . . . . . . .
Extended precision . . . . . . . . . . . . . . . . . . . . . . . . .
4.11.1 Integer sizes . . . . . . . . . . . . . . . . . . . . . . . . .
Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.12.1 Checking for problems with the -fcheck=bounds option
4.12.2 More about random numbers . . . . . . . . . . . . . . .
4.12.3 Arrays as subroutine arguments . . . . . . . . . . . . . .
Character strings . . . . . . . . . . . . . . . . . . . . . . . . . .
I/O with files . . . . . . . . . . . . . . . . . . . . . . . . . . . .
More about multi-dimensional arrays . . . . . . . . . . . . . . .
4.15.1 Arrays of strings . . . . . . . . . . . . . . . . . . . . . .
A more complex example of data processing . . . . . . . . . . .
Example sorting routine from Numerical Recipes . . . . . . . .
Example of saving values in a subroutine . . . . . . . . . . . . .
Complex numbers . . . . . . . . . . . . . . . . . . . . . . . . .
Array operations in F90 . . . . . . . . . . . . . . . . . . . . . .
Allocatable arrays . . . . . . . . . . . . . . . . . . . . . . . . .
Structures in F90 . . . . . . . . . . . . . . . . . . . . . . . . . .
Writing fast programs . . . . . . . . . . . . . . . . . . . . . . .
4.23.1 The -O option . . . . . . . . . . . . . . . . . . . . . . .
Fast I/O in Fortran . . . . . . . . . . . . . . . . . . . . . . . . .
4.24.1 Ascii versus binary files . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

58
58
61
61
62
63
65
66
68
69
71
73
75
77
80
81
86
88
89
90
93
97
100
100
105
107
111
113
116
117
119
121
122
124

5 Fun programs
5.1 Tic-tac-toe . . . . . . . . . . . . . .
5.2 Fractals . . . . . . . . . . . . . . . .
5.2.1 Plotting using MatLab . . . .
5.2.2 Plotting using Python . . . .
5.2.3 Good targets and more about

. . . . .
. . . . .
. . . . .
. . . . .
fractals

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

127
127
133
137
137
140

6 Lisas Python Notes


6.1 Introducing Python . . . . . . . .
6.2 An Interactive Python session . .
6.3 A first look at NumPy . . . . . .
6.4 Variable types . . . . . . . . . . .
6.5 Data Structures . . . . . . . . . .
6.5.1 Lists . . . . . . . . . . . .
6.5.2 More about strings . . . .
6.5.3 Data structures as objects

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

143
143
144
145
147
148
148
150
150

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

CONTENTS

6.6
6.7

6.8

6.9

6.10

6.11
6.12

6.13

6.14
6.15

6.5.4 Tuples . . . . . . . . . . . . . . . . . . .
6.5.5 Dictionaries! . . . . . . . . . . . . . . .
6.5.6 N-dimensional arrays . . . . . . . . . . .
Python Scripts . . . . . . . . . . . . . . . . . .
A first look at code blocks . . . . . . . . . . . .
6.7.1 The for loop . . . . . . . . . . . . . . .
6.7.2 If and while blocks . . . . . . . . . . . .
6.7.3 Code blocks in interactive Python . . .
File I/O in Python . . . . . . . . . . . . . . . .
6.8.1 Reading data in . . . . . . . . . . . . .
6.8.2 Command line switches . . . . . . . . .
6.8.3 Writing data out . . . . . . . . . . . . .
Functions . . . . . . . . . . . . . . . . . . . . .
6.9.1 Line by line analysis . . . . . . . . . . .
6.9.2 Main program as function . . . . . . . .
6.9.3 Scope of variables . . . . . . . . . . . .
Combining F90 code with Python . . . . . . . .
6.10.1 Brute force f2py method: . . . . . . . .
6.10.2 Signature files . . . . . . . . . . . . . . .
6.10.3 F90 Surgery . . . . . . . . . . . . . . . .
6.10.4 A few more things you need to know: .
Classes . . . . . . . . . . . . . . . . . . . . . . .
Matplotlib . . . . . . . . . . . . . . . . . . . . .
6.12.1 A first plot . . . . . . . . . . . . . . . .
6.12.2 Multiple figures and more customization
6.12.3 Adding text . . . . . . . . . . . . . . . .
6.12.4 Histograms . . . . . . . . . . . . . . . .
6.12.5 Pie Charts . . . . . . . . . . . . . . . .
6.12.6 Basemap . . . . . . . . . . . . . . . . .
6.12.7 Contour plots . . . . . . . . . . . . . . .
Deeper into NumPy and Scipy . . . . . . . . .
6.13.1 More on slicing with arrays . . . . . . .
6.13.2 Looping through arrays . . . . . . . . .
6.13.3 Random Numbers . . . . . . . . . . . .
6.13.4 Normal Distribution . . . . . . . . . . .
6.13.5 Statistics in Python . . . . . . . . . . .
Graphical User Interfaces - GUIs . . . . . . . .
6.14.1 Interacting with plots . . . . . . . . . .
6.14.2 Event Handling in matplotlib . . . . . .
3D plotting with Python . . . . . . . . . . . . .
6.15.1 Geoscience applications . . . . . . . . .

7 Peters Python Notes


7.1 How to multiply two integers . . . . . . .
7.1.1 Declaring variables . . . . . . . . .
7.1.2 Alternate coding options . . . . . .
7.2 Numpy . . . . . . . . . . . . . . . . . . . .
7.3 Making a trig table using a for statement
7.3.1 Numpy mathematical functions . .
7.3.2 Possible integer/real problems . . .
7.4 More about Python for loops . . . . . . .

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

150
151
151
154
156
158
160
162
162
162
165
166
168
169
170
172
173
174
176
178
178
179
181
181
185
186
188
191
192
193
195
197
198
198
200
200
202
207
214
220
225

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

233
234
235
236
236
237
239
239
240

vi

CONTENTS
7.5
7.6
7.7
7.8
7.9
7.10
7.11
7.12
7.13
7.14

7.4.1 More about formats . . . . . . . . . . .


Input using keyboard . . . . . . . . . . . . . . .
If statements . . . . . . . . . . . . . . . . . . .
If, elif, else constructs . . . . . . . . . . . . . .
Greatest common factor example . . . . . . . .
User defined functions . . . . . . . . . . . . . .
7.9.1 Python keywords . . . . . . . . . . . . .
7.9.2 Using functions in separate files . . . . .
Arrays . . . . . . . . . . . . . . . . . . . . . . .
Character strings . . . . . . . . . . . . . . . . .
I/O with files . . . . . . . . . . . . . . . . . . .
Using tuples and lists . . . . . . . . . . . . . .
Plotting with Python . . . . . . . . . . . . . . .
7.14.1 Example of computing least-squares line

8 LaTeX
8.1 A simple example . . . . . . . . .
8.2 Example with equations . . . . .
8.3 Changing the default parameters
8.3.1 Font size . . . . . . . . . .
8.3.2 Font attributes . . . . . .
8.3.3 Line spacing . . . . . . .
8.4 Including graphics . . . . . . . .
8.5 Want to know more? . . . . . . .

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

241
241
242
245
246
247
249
250
251
253
254
257
257
260

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

265
266
267
269
269
269
270
271
273

9 Postscript plotting
275
9.1 PSPLOT Fortran subroutines . . . . . . . . . . . . . . . . . . . . . . 279

Chapter 1

Introduction
This course is intended to help incoming students get up to speed on the various
computing tools that will help them with their research and some of the homework
assignments for other classes. The perspective is largely that of the Geoscience
program at SIO, but we hope that the course is general enough to be useful for
other students as well.
All students should have access to a Mac and an account on the IGPP Mac
network. Please let me know if you do not already have an account. If you are using
your own Mac, you will need to install the following:
Fink
gfortran
GMT (from fink)
XCode
Python
TexShop
TextWrangler
There are download instructions on the class website.
The planned class schedule is listed in Table 1.1. We will spend a few classes on
UNIX, GMT and other topics, but the bulk of the class will be an introduction to
the Fortran90 and Python programming languages. If you are already experienced
in C or Fortran, you probably dont need to take this class (although you may find
Python and some of the other material and the geoscience examples of interest).
1

CHAPTER 1. INTRODUCTION

Monday
10/01
10/08
10/15
10/22
10/29
11/05
11/12
11/19
11/26
12/03
12/10

1.1
1.1.1

UNIX
GMT
F90
F90
F90
F90
UC holiday
F90
Python
AGU week
TTT playoff

Table 1.1: Class Schedule


Wednesday
Friday
09/28
10/03 UNIX
10/05
10/10 F90
10/12
10/17 F90
10/19
10/24 F90
10/26
10/31 no class
11/02
11/07 F90
11/09
11/14 F90
11/16
11/21 F90
11/23
11/28 Python
11/30
12/05 AGU week 12/07

Intro
UNIX
F90
F90
F90
no class
F90
F90
no class (Thanksgiving)
Python
AGU week

Scientific Computing at SIO


Hardware

Before 2004 or so, computer hardware used at SIO was of two main types:
1. UNIX Workstations (e.g., Sun, HP, Silicon Graphics, etc.)
(a)
(b)
(c)
(d)

Scientific programming
Data storage
General research
(some word processing, graphics)

2. PCs (Windows machines, Macs)


(a)
(b)
(c)
(d)

Word Processing
Graphics
Presentations (talks and posters)
(some programming)

However, these boundaries are now blurred because many PCs run Linex1 (an
open source form of UNIX) and Apple has adopted UNIX for their operating system
(starting with OS X).
This permits these machines to be used for both serious scientific computations
and word processing and more traditional PC activities.
1

Linex is pronounced similar to linen (see http://www.paul.sladen.org/pronunciation/)

1.1. SCIENTIFIC COMPUTING AT SIO

Ten to twenty years ago, most geophysics departments had networks of Sun
workstations. Today these have largely been replaced with networks of cheap PCs
running Linux or with Apple machines. At IGPP we now use Macs for everything
except for certain specialized or high-performance computers maintained by individual PIs. For example, David Sandwells group maintains some Suns because some
of their processing software runs only on that platform.

1.1.2

Software

Here is a list of programming languages and software used at IGPP (Items with *
will be discussed in this class).
1. Programming
(a)
(b)
(c)
(d)

*FORTRAN (most IGPP faculty use this, large library of existing code)
C (more widely taught and used by computer scientists)
C++ (extended version of C)
Java (used a lot for web programming, has portability advantages, but
often not fast enough for serious computing)
(e) *Python (lots of buzz, Lisa Tauxe likes it)
(f) MatLab (widely used but commercial product, previously presented in
this class, but our aim is to replace it with Python)
(g) HTML for web sites (most people now dont program in raw html, but
use web designer software, such as DreamWeaver)

2. Community Programs
(a)
(b)
(c)
(d)

Bob Parker programs (plotxy, color, contour)


*GMT for mapping (UNIX based)
SAC for seismic analysis
etc.

Commercial Programs
(a) MS Word for word processing
(b) MS Powerpoint or Apple Keynote for presentations
(c) *TeX/LaTex (no single vendor, different implementations available, files
are compatible, best option for papers & thesis)
(d) Adobe Illustrator (for graphics, preparing posters)
(e) PhotoShop
(f) Mathematica
(g) etc.

CHAPTER 1. INTRODUCTION

1.2

Why you should learn a real language

Many students arrive at SIO without much experience in FORTRAN or C, the


two main scientific programming languages in use today. While it is possible to
get by for most class assignments by using Matlab, you will likely be handicapped
in your research at some point if you dont learn FORTRAN or C. Matlab is very
convenient for quick results but has limited flexibility. Often this means that a simple
FORTRAN or C program can be written that will perform a task far more cleanly
and efficiently than Matlab, even if a complicated Matlab script can be kludged
together to do the same thing. In addition, Matlab is a commercial product that
does not have the long-term stability of other languages, including large libraries of
existing code that are freely shared among researchers.
Your research may involve processing data using a FORTRAN or C program. If
you do not understand the program, you will not be able to modify it to do anything
other than what it can already do. This will make it difficult to do anything original
in your research. You may resort to elaborate kludges to get the program to do what
you want, when a simple modification to the code would be much easier. Worse, you
may drive your colleagues crazy by continually requesting that the original authors
of the program make changes to accommodate your wishes.
Finally, you will be in a more competitive position to get a job after you graduate
if you have real programming experience.

1.3

Learning to program

Learning to program can be intimidating, particularly if you have not previous


experience. The worst thing to do is to buy a book on the language and try to read
it. There is simply too much material and it seems overwhelming. DO NOT DO
THIS! Instead, begin by writing the simplest possible program and get it to work.
Then read just enough to add a single additional feature to your program and get
that to work. You only really learn when you get your own programs to work, so
the idea is to write lots of little programs that do various things and only gradually
add in new concepts. There is no need to learn everything that the language will
do all at once.
Part of learning to write more complicated programs is to figure out ways to
logically divide the problem into smaller pieces. This will come with experience.

1.4. FORTRAN VS. C

The final project in this class will be to write a program that plays tic-tac-toe. This
is a daunting task if one tries to write it all at once. But we will divide it into
smaller parts for separate assignments over the weeks, and only put all the pieces
together for the final program at the end.
Also, be aware the most languages have far more features than you actually need.
Few of us except professional programmers learn them all. But we do not need to
write professional programswe only need to learn to write practical programs that
serve our own needs. For example, I probably only know about 20 UNIX commands.
A real computer nerd would find pathetic some of the ways that I do things. But
its been enough for me to get by and to do the science that I want to do. (Having
said that, it wouldnt hurt for me to learn more and Im going to look over some of
Duncan Agnews notes from the 2009 class to find some new tricks)

1.4

FORTRAN vs. C

Some years ago when I (Peter) first started teaching programming in this class, I had
to decide whether to teach C or Fortran. Like most SIO faculty of my generation or
older, I was experienced in FORTRAN but had little exposure to C. After talking to
some people who know both FORTRAN and C, however, I decided to bite the bullet
and learn enough C to teach this class. This was motivated in part by the fact that
C is now one of the standard languages taught in computer science departments;
many of our incoming students have C experience but few have Fortran experience.
Here is my summary of the advantages and disadvantages of either choice:
FORTRAN
Advantages

Large amount of existing code


Preferred language of most SIO faculty (most faculty are old)
Complex numbers are built in
Choice of single or double precision math functions

Disadvantages
Column sensitive format in older versions
Dead language in computer science departments
C
Advantages
Large amount of existing code

CHAPTER 1. INTRODUCTION

Preferred language of incoming students, some younger faculty


Free format, not column sensitive
More efficient I/O
Easier to use pointers

Disadvantages
Less user-friendly than FORTRAN (I think so, but others may debate
this)
Fewer built in math functions (but easy to fix)
No standard complex numbers (but easy to fix)
Easier to use pointers
With reasonable compilers, both languages are equally fast. Ultimately, there
are fixes for both languages that permit them to assume the advantages of the other
language. For example, you can use pointers in FORTRAN90 and you can define
an add-on set of functions to do complex arithmetic in C. However, after learning
enough C to teach the class, I concluded that C is not a very user friendly language
for those without programming experience. So I switched to Fortran90, an improved
version of Fortran77, that combines the advantages of Fortran and C into a more
user-friendly package.

1.5

Python

Python is an up-and-coming new programming language that is gaining popularity.


It is very flexible and combines many of the advantages of traditional programming
languages (e.g., C and Fortran) with object-oriented languages (e.g., Java) and with
scripting languages (e.g., UNIX shell scripts). It also has more integrated graphics
options in its standard library than C or Fortran (which require use of non-standard
plotting packages). Indeed it has so many features that it can be daunting for
beginning programmers. For this reason, we are introducing it after the F90 part of
the class.
Python is free and runs on almost all computers. An interesting feature is that
indentation is required in the block delimiters used in if statements and do/for loops.
The idea is to enforce good programming style. Python was started in 1989 by Guido
van Rossum from the Netherlands, who has been been given the name Benevolent
Dictator for Life (BDFL) by the Python community. The name Python comes
from the old Monty Python TV series and Monty Python references are common in
example code.

1.5. PYTHON

Python is a dynamic programming language, which is not compiled before it is


run (C and Fortran programs are first compiled). Other examples of dynamic languages include BASIC and Matlab. Dynamic languages generally run much slower
than compiled languages, which can be a disadvantage for serious data processing
and number crunching. However, it is possible to call C or Fortran subroutines from
within Python and we will learn how to do this. This provides a way to combine
the speed of Fortran with the graphical capabilities of Python.

Peter Shearer, pshearer@ucsd.edu, x42260, Munk Lab, IGPP

Class web site: http://mahi.ucsd.edu/class233/

CHAPTER 1. INTRODUCTION

Chapter 2

UNIX introduction
You will very likely do your scientific computing in a UNIX environment. UNIX
is by far the most common operating system for the workstations that dominate
todays scientific computing. There are many different versions of UNIX. In this
class we will be using OS X on the Macs, which is an Apple version of UNIX. Some
of us still work some on the Suns, which use Solaris (also UNIX). There is also a
free version of UNIX, called LINUX (pronounced lynn exs), that will run on most
PCs.

2.1

Getting started

To use UNIX on the Macs, you will need to bring up a regular text-based window
rather than the standard Mac windows. This can be done by running the Terminal
program, which in normally in Applications/Utilities. We will assume in these notes
that you are entering UNIX commands within such a window. In many cases you
will need to have X11 (called XQuartz in the newest Mac OS) also running in order
to display graphics, etc. Thus I recommend that you always run X11 first, and then
Terminal to create the UNIX text window that you will use. (X11 also has a text
window but its scroll bar is not as nice as the one in the Terminal application).
X11 (XQuartz) can be downloaded for free from Apple if you do not already have
it installed on your machine.
A note about UNIX shells: UNIX comes in different flavors. We are going to
assume that you are running what is called the C shell (csh or tcsh). However
it is possible that your Mac is running something called bash, which many people
think is better than csh. The default C-shells for Macs changed from tcsh to bash
9

10

CHAPTER 2. UNIX INTRODUCTION

between Jaguar and Panther (OS X 10.2 to 10.3). However because these notes are
based on csh, you should make sure your Mac is running tcsh (if you think bash is
better, than you probably are more of a UNIX expert than me and dont need to
be taking this part of the course anyway!). To find out what you are running, just
look at the top of the Terminal window and it will say. Alternatively, type
printenv
within your UNIX shell and it will tell you lots of interesting things, including what
your SHELL is. To set things up so that you are always in tcsh, run the Terminal
program and select Preferences from the pull-down menu under Terminal at
the top of the screen. Select Execute this command (specify complete path) and
enter /bin/tcsh in the little box. Then just close the Terminal Preferences box
and restart Terminal.
To learn more about UNIX shells on the Macs, check out:
http://www.macdevcenter.com/pub/a/mac/2004/02/24/bash.html
I am by no means an expert on UNIX; probably there are several of you here
that know much more than I do. I have learned only enough to get by and could
benefit from learning more. So Im just going to outline the basics here for the
benefit of those students who have not been exposed much to UNIX.
The UNIX operating system was originally designed to run on mainframe computers where security was a big issue. You dont want users to be able to delete
other users files or do other nasty things. So there are a lot of security features.
The first of these is the login and the password. You will be assigned a login name
and password. The first thing that you should do after you login is change your
password so that only you know what it is. This normally happens automatically
upon your first login. If not, or if you later want to change your password, on the
Macs, go to System Preferences and click on Accounts. To change your password
on more traditional UNIX machines, use the command:
passwd
You should choose a password with numbers or special characters as well as letters.
Do not choose a common word that can easily be guessed by outside hackers. The
NetOps people here are very concerned about security issues and IGPP computers
have been attacked several times already, although no damage has been done.

2.2. BASIC COMMANDS

11

Do not write your password down (like on a note next to your computer!) where
it can be found. Do not tell other students your password. You can easily give them
access to read your files without giving them the password. Of course all this makes
it very hard to remember your password if you dont use the computer very often....
Let us assume that you have successfully logged on. Your will now be located in
your home directory. UNIX uses a directory tree structure, similar to the folders
used by PCs but without all the fancy icons. Normally you will have a cursor prompt
that tells you what machine you are running on. In my case, this looks like:
shearer@katmai 5>
because my laptop (my primary computer) is called katmai. Your default prompt
is likely to be different from this. If you want to have one like mine, enter
set prompt = "%n@%m %h> "
where n will give your username, m your machine name, and h the line number.
You will find it convenient to put this in your .cshrc file (see below). The notes that
follow were written when I used a now defunct Sun computer named rock and so
have rock as the standard prompt.

2.2

Basic commands

To find out where you are, enter pwd which stands for Print Working Directory.
In my case, this will give:
rock> pwd
/net/rock/shearer
This shows that I am located in my home directory (named shearer) on rock.
You can list the contents of your current directory with the ls command:
rock> ls
(Im not showing you the output because its too messy in my case!)
UNIX has an online set of manuals that may be accessed with the man command. For example, suppose you forgot what pwd does. You could type man
pwd and you would get a description of the pwd command. One annoying aspect

12

CHAPTER 2. UNIX INTRODUCTION

of standard UNIX is that the man command output uses a form of the VI editor
rather than the normal window output, which permits scrolling up and down in the
window. When you enter man pwd you will get a page of output on the screen
with a colon (:) at the bottom. If you enter the space bar, you will then get the
next page of output, etc. But you cant scroll backwards using the window scroll
arrows. To go backwards, enter b at the colon prompt. When you get to the final
page, you will see END at the bottom. To leave the man pages and return to the
normal terminal, hit the q key (for quit) at any point.
If you find this too annoying to deal with, you can always save the man pages
to another file. To do this, enter:
man pwd > man.pwd
The output will now be directed into a file named man.pwd rather than to the
screen. To look at man.pwd, use the cat command:
cat man.pwd
The cat command prints a text file to the screen. The advantage in this case is
that you can use the scroll bar. Note that if you try to look at man.pwd using a
text editor, you will likely see how kinds of weird control characters that are used
to format the screen output.
Of course, you risk cluttering up your directory with lots of files named man.whatever
if you do this a lot. If you dont want to worry about deleting them, one solution is
to use the name junk as a temporary place to store output. If you already have a
file named junk, it will just overwrite the old file. This way, you only ever have one
scratch file in your directory at one time.
Another approach, if you access to the web, is to simply Google unix man pwd
to bring up the man pages, nicely formatted!
There usually are options to UNIX commands that can make them more useful
for what you want. For example, ls simply lists the file names in your directory.
If you want to find out how big they are or when they were last modified, then use
ls -l where the l stands for the long output option. In your home directory, you
will have a very important file called .cshrc (see below). This file will not appear
if you just enter ls. To make it appear, enter ls -a where the a stands for the
all file option.

2.2. BASIC COMMANDS

13

To change your directory, use:


cd dirname
This will move you to the directory dirname. This directory name must be in your
current directory. Or you can give the full path name for dirname, i.e.
cd /net/rock/shearer
will get me to my home directory no matter where I am on the system. Alternatively,
one can go to ones home directory by entering
cd
You can also go directly to subdirectories by entering:
cd ~/dir1/dir2
This will get you to directory dir2, located in dir1 in your home directory. Naturally
this does the same thing as
cd
cd dir1
cd dir2
You can go back up one level by entering
cd ..
You can go back up and then down again into a different directory by entering:
cd ../dir2
To create a new directory, use the make directory command:
mkdir dir1
It is often convenient to use a different naming convention for your directories in
order to distinguish them from your files. Some people put .dir after their directories.
For awhile I put d. in front of the directory names, which has the advantage of
grouping them together when ls is entered, since UNIX lists things in alphabetical

14

CHAPTER 2. UNIX INTRODUCTION

order. Most recently, I have been using all capital letters for the directory names.
This makes them more visible, but has the disadvantage of slowing down typing
their names (yes, UNIX is case sensitive!).
If you dont want to use special names for directories or if you find yourself
in somebody elses directory where they dont do this, you can use the ls -F
command:
ls -F
This will add a / suffix to directory names and a * prefix to excutable file names. I
like this so much that I have set this up as my default ls command by putting an
alias into my .cshrc file (more about this later).

2.3

Files and editing

Files can be simple ascii (text) files or they can be binary files, often the executable
versions of computer programs. One standard UNIX editor is called vi and is still
used by some of the old time programmers at IGPP who will insist that it is remains
the best editor. If you know all of its tricks it can be an extremely powerful editor.
You can do things, like cut and paste columns of numbers, that most editors cant
do. It also has the advantage of running on any terminal so if you log in from home
you can still edit files.
If you know vi or decide that you want to learn itmore power to you. You will
not lose any points around here. However, vi is not mouse or window friendly and
is not favored by most students today. I have forgotten most of the vi that I once
knew (I used it to do my thesis back in the late Cretaceous).
Editors on the Mac include:
TextEdit You can find this in the applications folder. Be sure to set the
format (under Preferences) to Plain text and not Rich text, which will put in
control characters that will screw things up.
nedit This is supposed to run on all platforms and has more features than
textedit.
emacs this is a very powerful editor that is used by computing professionals.
It can do just about anything but may be somewhat harder to learn at first
than simpler editors.

2.4. BASIC COMMANDS, CONTINUED

15

Xcode special editor designed for editing programming code


You can use any of these editors to create a new file or edit an existing file. Unix
file names are case sensitive. By convention, the type of file is often indicated by
.type, for example:
testprog.f for Fortran77 program source code
testprog.f90 for Fortran90 source code
testprog.c for C program source code
testprog.m for a MatLab script
testprog.o for an object file
figure1.ps for a Postscript file
figure1.gif for a GIF file
You may wish to create your own naming convention to keep track of your files.
When used with wildcards (see below) this will make it easy to list all files of a
particular type.
WARNING: Do not use the dash character (-) in file names; this may cause all
kinds of problems for you. Use . or to separate the words. NEVER USE BLANKS
IN FILE OR DIRECTORY NAMES!!! (I know this is common on Macs and PCs
but you will eventually have big problems reading these files with your programs)

2.4

Basic commands, continued

If you want to remove a file use the rm command:


rm filename
If you want to change the name of a file, use the mv (move) command:
mv filename1 filename2
If filename2 already exists, this will have the possibly undesired consequent of deleting the original filename2. To guard against this, use the -i option:
mv -i filename1 filename2
Now if filename2 already exists, the computer will ask you first if you want to
overwrite this file.
mv can also be used to move files between directories:

16

CHAPTER 2. UNIX INTRODUCTION

mv filename dirname
will move filename into directory dirname (assuming dirname already exists!) where
it will have the same name. Note that this does the same thing as

mv filename dirname/filename
For convenience, you can leave off the /filename if the file is to keep the same
name. A note of caution: In this shortened version, if dirname does not exist as a
directory, then the name of filename will be changed to dirname.
The copy command works in a similar way:

cp filename1 filename2
makes a copy of filename1 called filename2.

cp -i filename1 filename2
will first ask if you really want to do this if filename2 already exists. You can copy
files to different directories in the same way as the mv command works.
Many people so prefer the -i option for mv and cp that they make it the default
option by defining an alias in their .cshrc file (see below) so that mv and cp become
mv -i and cp -i. I recommend that you do thisit is likely to save you some
grief in the future.
To remove a directory, use the rmdir command:

rmdir dirname
The directory must first be empty for this to work. To recursively remove a directory
and its contents use:

rm -r dirname
Use this with extreme caution to avoid accidentally deleting more than you intended!

2.4. BASIC COMMANDS, CONTINUED

2.4.1

17

Wildcards

UNIX commands become much more powerful when they are used with the wildcard
character * which can take on any ascii string. For example, suppose you wanted
to list all files ending with .f the suffix used to identify Fortran source code. Simply
enter:
ls *.f
You could move all of these programs in a subdirectory called source.dir by
entering:
mv *.f source.dir
You might have a bunch of plot files called mypost1, mypost2, etc. You could
delete all of these at once by entering:
rm mypost*
For obvious reasons, be very careful when using * with the rm command. For
example, suppose you wanted to delete all files in your current directory that end
in %, which some text editors use to store the original version of edited files. To do
this, enter:
rm *%
Suppose, however, that you are careless when you type this and enter instead:
rm * %
This will delete everything in your current directory! So always look carefully at
what you have typed before hitting the return key when you are deleting files using
wildcards.

2.4.2

The .login and .cshrc files

In your home directory, you can have a files called .login and .cshrc that are executed
whenever you login. These files are used to define and customize your environment.
You may have default .login and .cshrc files set up by the NetOps people when you
first logon, but you can modify them to do what you want.
For example, you can put aliases in your .cshrc file for the mv and cp commands
so that you dont accidentally overwrite files:

18

CHAPTER 2. UNIX INTRODUCTION

alias cp cp -i
alias mv mv -i
One of the most important things in the .cshrc file is the set path command.
This lists all the directories that the computer should look in when you try to run
a program. For example, if you type matlab the computer needs to know where
to look to find the matlab program. If you get a Command not found message,
then you dont have the right directories listed in your .cshrc file under set path
Here are example set path commands:
set
set
set
set
set
set
set
set

path=($path
path=($path
path=($path
path=($path
path=($path
path=($path
path=($path
path=($path

/usr/X11R6/bin )
/sw/bin )
/sw/sbin )
/sw/igpp/bin )
/Applications )
/Applications/Utilities )
/Applications/MATLAB6p5p1/bin)
/Developer/Applications )

In the past, I recommended that students include the current directory in their
path, i.e.,
set path=($path .)
This has the advantage that you can run programs in the current directory simply
by typing their name, i.e., if you have a program called compspec, you can run it
by typing compspec on the command line. However, NetOps has now convinced
me that this is a security risk because bad people could put malevolent programs
into your directories, with the same name as common UNIX commands (e.g., ls),
which would be executed when you innocently typed them. This means that to
execute compspec within the current directory, you must type ./compspec on the
command line (which really is not that much more work).
You can always look at other peoples .cshrc files if you have trouble with yours.
If you make any changes to your .cshrc file (or your .login file), they wont take place
until your next login. Alternatively you can enter:
source .cshrc
to make the changes immediately. But you will have to do this separately for each
window that you have open.

2.5. SCRIPTS

2.5

19

Scripts

One of the most powerful ways to use UNIX is to write scripts to run your programs.
This is an easy way to keep track of the input and output parameters and to make
changes without having to enter everything in again. For example, suppose you
have a program called mapfocal that asks you a bunch of questions like this:
shearer@katmai 16> ./mapfocal
reading cmt file...
nev =
24584
Enter output file name
globalcmtmap.ps
Enter fill level, line thickness for conts
0.93 1
Enter line thickness for plate boundaries
5
Enter line thickness for beachballs
1
Enter minimum magnitude
0
Enter circle radius for beach balls (neg. for M0 scale)
0.08
(1) complete plot or (2) outline for test
1
Creating psplot postscript file : globalcmtmap.ps
nplot =
415
shearer@katmai 17>
After running this program, you decide that you would like to change the continent line thickness to 2. This is tedious if you have to type in everything again.
Instead you can write a script to run the program called do.mapfocal that looks
like this:
./mapfocal << MAPFOCAL
globalcmtmap.ps
0.93 1
!fill level, line thickness for continents
5
!line thickness for plate boundaries
1
!line thickness for beach balls
0
!mininum magnitude
0.08
!circle radius for beach balls (neg for mom scale, 0.08 looks nice)
1
!1=complete plot or 2=outline for test
MAPFOCAL
This is just an ascii file that you write with your favorite text editor. Often
people will use ! instead of MAPFOCAL in scripts like this; they seem to work

20

CHAPTER 2. UNIX INTRODUCTION

in the same way. (Can anyone tell me if one convention has any advantages over
the other?)
The comments following the numbered input (e.g., !min,max quake magnitude)
are convenient if the program that you are running is robust enough to ignore
additional characters on line that follow the numbers that are actually input into
the program.
To run the script, simply enter:
./do.mapfocal
You may get a message which says:
do.mapfocal: Permission denied
This means that you dont have execute permission on this file. To see what the
permissions are, use the ls -l command:
shearer@katmai 18> ls -l do.mapfocal
-rw-r--r--

1 shearer

669

421 Sep 30 09:20 do.mapfocal*

The -rw-rr shows what the permissions are for the file. Columns 2-4 (rw-) are
your permissions as owner of the file. Columns 5-7 and 8-10 give the permissions for
your group and all others, respectively. The r means read permission, the w means
write permission and x means execute permission. In this case our problem is that
we have rw- instead of rwx To fix this, use the chmod command:
chmod 755 do.mapfocal
This will give you rwx permission of file do.mapquake and will give everyone
else rx permission but not w permission. This is probably the most common set of
permissions you will want to set, assuming you are OK with others looking at your
code, as long as they are not allowed to make changes. If you dont want anyone
else to be able to see your file, set
chmod 700 do.mapfocal
For more details see the chmod manual.

2.6. FILE TRANSFER AND COMPRESSION

21

The do.mapfocal script will run the mapfocal program and enter all of the required inputs. Notice that in this case I have added helpful comments to the numerical input lines. You can do this with most programs and it will not affect the input
(at least for FORTRAN, Im not so sure about C). You usually cannot, however, add
comments to the character inputs (e.g., globalcmtmap.ps in this example) because
it will consider them part of the name.
Now its easy to keep track of the inputs and to make changes without having
to re-enter everything. You can also put scripts together to run the program many
times:

./mapfocal << !
globalcmtmap1.ps
0.93 1
5
1
0
0.08
1
!
./mapfocal << !
globalcmtmap2.ps
0.93 2
5
1
0
1.1
1
!

!fill level, line thickness for continents


!line thickness for plate boundaries
!line thickness for beach balls
!mininum magnitude
!circle radius for beach balls (neg for mom scale, 0.08 looks nice)
!1=complete plot or 2=outline for test

!fill level, line thickness for continents


!line thickness for plate boundaries
!line thickness for beach balls
!mininum magnitude
!circle radius for beach balls (neg for mom scale, 0.08 looks nice)
!1=complete plot or 2=outline for test

In this case, you can make two different plots from the same script.

2.6

File transfer and compression

You often will want to get files from another computer. There are various ways to
do this. Today often files can simply be accessed via the web through a web browser,
or by directly mounting the disk of the other computer. An old fashioned method
of file transfer is the ftp (file transfer protocol) command, which is still used quite
a bit.

22

CHAPTER 2. UNIX INTRODUCTION

2.6.1

FTP command

One classic method of UNIX file transfer is the ftp command. For many public sites
and data centers this is done with anonymous ftp. You simply enter:
ftp othercomputer
where othercomputer is the name (or IP address) of the other computer. When
you are asked to login, just enter anonymous and then your e-mail address as a
password. Of course, if you have login permission on the other computer then you
can ftp to the machine even if it is not set up for anonymous ftp.
Once within ftp, you will get a ftp prompt. You then can use the cd command
to get to the directory that you want and ls to see the file names on the remote
computer. Finally use get to bring the desired file to your own computer and
then quit to exit ftp. If you want to get all of the files in the directory, use the
command mget * and you will be prompted for each file name. If you dont want
to be prompted, you can turn off the interactive mode by entering
ftp -i othercomputer
when you first invoke ftp. If you are getting binary files (rather than simple text
files), you should enter
type binary
at the FTP prompt before getting the files. Because type binary will work with all
types of files, including ascii, it is a good practice to always enter type binary as
soon as you enter ftp. The default for the Suns is type ascii which does not work
with binary files.
If the remote computer objects to regular ftp for security reasons, you might try
the sftp which is the secure ftp command.
If you have just a few files to transfer, you may want to use the secure copy
command:
scp filename *.ps shearer@rock.ucsd.edu:./TRANSFER
will copy filename and all files ending with .ps from the current directory to the
TRANSFER directory in shearers account on rock.ucsd.edu (after first prompting

2.6. FILE TRANSFER AND COMPRESSION

23

for the password). If you want to copy a directory, you need to use the -r (recursive)
option, i.e.,
scp -r dirname shearer@rock.ucsd.edu:./TRANSFER
Note that one can also copy files from the remote computer to the current directory, i.e.
scp -r shearer@rock.ucsd.edu:/net/moray2/scratch/dirname dirname\_local
In this case the remote directory was on the /net/moray2 disk; notice how we
specified the complete path name.

2.6.2

File compression

Often the files that you retrieve will be compressed. Files are compressed using the
UNIX compress command:
compress filename
This will change the name to filename.Z which tells you that it is compressed.
This is useful to save disk space when you will not be using the file for awhile. To
get back to the original file, use uncompress:
uncompress filename
You can compress a whole bunch of files by using wildcards:
compress *.ps
will compress all of your Postscript files, assuming you use the .ps suffix convention
for these files. These can be uncompressed with uncompress *.ps as you would
expect.
An alternative compression method (not standard UNIX but usually available)
is invoked with the gzip command:
gzip filename
This changes the name to filename.gz with the reverse operation:

24

CHAPTER 2. UNIX INTRODUCTION

gunzip filename
The gunzip command will also decompress .Z files (but the uncompress command
will not decompress .gz files).
Note: On Macs, most compressed file formats can be uncompressed by simply
double-clicking on them.
You may find it useful to use compression yourself by compressing files that you
do not use very often in order to save space. I often do this when I want to get some
disk space but am too nervous to delete the files and too lazy to write them to a
backup tape.

2.6.3

Using the tar command

Often you may want to save or retrieve an entire directory of files. This is most
easily done using the tar command. If you are within the directory containing the
files that you wish to save, then enter:
tar -cvf ../archive.tar .
The arguments are as follows:
-c
-v
-f
../archive.tar
.

create tar archive


verify by printing file names to screen
output file name will follow
name for tar file (../ to put in next level up)
tar every file in current directory

Alternatively you can save the entire directory and its contents from the level
above the directory:
tar -cvf programs.tar programs.dir
The tar file can then be FTPed to another machine. For even more efficiency
you may wish to compress the tar file first. The files can be retrieved as follows:
tar -xvf archive.tar
This will put all of the files in the archive into the current directory.
The tar command was originally written largely to write and retrieve backups,
etc., onto tape, in which case the file name (e.g, ../archive, programs.tar, etc.) in

2.6. FILE TRANSFER AND COMPRESSION

25

the above examples is replaced with the name for the tape device. Today hardly
anyone uses tape drives because large capacity hard drives have gotten so cheap.
The Exabyte and DAT tape drives we used to have in the barnyard have been
retired. If your advisor hands you an old data tape, you might check with NetOps
to see if they can read it, as they still have a few old tape readers.

2.6.4

Remote logins and job control

Often you will want to run on a different machine than the one that you are sitting
at. The other machine may be faster, have more memory, be connected to a tape
drive, etc. To do this enter:
ssh machinename
and you can login and run remotely on this machine. Of course this will slow the
machine down for anyone else using it so use some courtesy in doing this. One way
to do this is to start your job with the nice command:
nice do.bigjob
where do.bigjob is the script that runs your program. The nice command lowers
the priority of your job so that it will not interfere with others using the machine.
You still may be unpopular, however, it you use a lot of memory on the machine.
Niceness levels range from 1 (highest) to 19 (lowest). The default for the nice
command is niceness 4 or 10 (depending upon which UNIX shell you are running).
To set the niceness to a specific value, you can specify a number, e.g.,
nice +15 do.bigjob
will run do.bigjob at niceness 15.
As a word of caution, most people dont like to have other people run jobs on
their personal machines (the ones in their offices) without permission, even if the
jobs are set to run at large niceness values.
To find out what jobs are running on your machine, you can use the top command to list the most active jobs:
top

26

CHAPTER 2. UNIX INTRODUCTION


This will take over your window and update the results continuously until you

enter q for quit. The Process ID (PID) is listed, together will the username, the
niceness, the faction of the CPU being used and other useful information. top is
interactive and you can input various commands (? for help, u to see only one user,
etc.). You can stop (kill) jobs from within the top program, or at the command line
using the kill command (see below).
If you did not originally use nice when you started a job you can renice the job
from within top by entering r Alternatively, if you know the job number you can
change the niceness at the command level, e.g.,
rock> renice +15 1132
where 1132 is the PID number (get from top program). You can renice more than
once, but only to raise the niceness level you can never lower the niceless level
once it is set.

2.7

Miscellaneous commands

Unix keeps track of your previous commands. To see them, enter


history
for history and it will list your last 30 commands. To repeat a past command, enter
! followed by line number in the h list or simply the first few letters of the
command. To repeat the very last command, enter !!
Want to see what the beginning or end of a file looks like? Use the head or tail
command:
head filename

---lists the first 10 lines of the file

tail filename

---lists the last 10 lines of the file

To see a file one page at a time, use the more command:


more filename
and then hit the space bar to advance one page at a time.
Want to see how many lines and words are in a text file? Use the word count
(wc) command:

2.7. MISCELLANEOUS COMMANDS

27

wc filename
This will list the number of lines, words, and bytes in the file. Often I just want to
know the number of lines, in which case it will run faster to use the -l option for
line:
wc -l filename
Lose track of where a file is? You can use the find command:
find . -name filename -print
This will look in the current directory (thats what the . is for) and below for
the file named filename and then print where it is. What if you only know some
of the characters of the file? You can use a wildcard:
find . -name map* -print
This will find all files that begin with map and print them on your screen. Note
that you MUST enclose map* in apostrophes for this to work. Of course, most
computers now have built in search programs (e.g., Mac Spotlight) to locate files,
which may be more useful in many cases.
Unix has many powerful utility programs. One of these is the sort command
to sort files in alphabetical or numerical order. Example:
sort +4 -n -b -r file1 -o file2
This sorts file1 and outputs the results to file2. The following options are used
in this example:
+4
-n
-b
-r

skip first 4 fields (leave out to use beginning of line)


numerical order (default is alphabetical order)
ignore leading blanks
reverse order (leave out for standard order)

NOTE: The +4 option no longer seems to work. Instead you set the field number
(normally the column number) using the -k option, i.e.
sort -k 5 -n -b -r file1 -o file2

28

CHAPTER 2. UNIX INTRODUCTION

should do the same thing as +4, i.e., skip 4 columns and sort on the 5th column.
To find out how much disk space is available use df (disk free):
df
This will not list all the disks on the system unless they have been mounted. Just
go to the disk that you are interested in and retry df if it does not appear the first
time.
To see how much space you are using, the best command is
du -ks *
This will list the disk usage of each of your subdirectories. It is a good idea to
go through your directories once a month or so to delete unnecessary files and/or
compress large files that you dont use very often.
To check for misspelled words in a file, use:
spell filename
This will list all words not found in a dictionary (which one?). Use the -b option
for the British spelling if you are submitting a paper to Nature. (NOTE: spell does
not seem to work on the Macs!)
To reformat a text file to a uniform maximum line length of 72 characters, use:
fmt file1 > file2
This assumes that blank lines separate paragraphs. Line feeds within a paragraph
are removed and added as necessary to make the lines of approximately equal length.
This command is a useful feature if you use regular unix mail and create your
outgoing messages with a text editor. It is helpful when you want to change a line
in the middle of a paragraph because you dont have to redo all the following lines
in order to make things look nice.
To preview a postscript file on the screen use:
gv filename.ps
which is an abbreviation for the old ghostview program.

2.8. ADVANCED UNIX

2.7.1

29

Common sense

This should go without saying, but you never know. Do not download pornography,
hate literature, etc., on UCSD computers. You may get into BIG trouble (worse,
you could get your advisor into trouble!).
You should not consider e-mail to be completely private. Even deleted e-mail is
usually still on the computer system somewhere, often on daily and weekly backup
tapes. Be very careful when you reply to messages sent to a group of people that
you do not send your message to everyone on the list (unless that is your intention).
Attempts at sarcasm and irony usually misfire in e-mails. Its best to err on the
side of professionalism in your communications with your colleagues.

2.8

Advanced UNIX

This writeup so far contains pretty much all Ive ever done with UNIX. You can go
pretty far with only 20 commands or so. However there are much more powerful
things you can do. If you really want to be a UNIX guru, try reading up on pipes,
grep, awk, sed, etc. Then you will be able to write custom scripts to do all kinds of
neat things. Here are some examples:
grep dziewonski filename
This will print every line from file filename that contains the string Dziewonski.
grep -v dziewonski filename
This will print every line from file filename that does NOT contain the string
Dziewonski.
grep dziewonski filename > dz.lines
This works as above except the lines containing dziewonski are written to the file
dz.lines rather than printed to the screen.
grep dziewonski *
This lists lines containing dziewonski from ALL files within your current directory.
However, you may really want just the file names of the files that contain the string
dziewonski, in which case the following command will work better:

30

CHAPTER 2. UNIX INTRODUCTION

grep -l dziewonski *
This lists the file names of the files that contain the string dziewonski
grep ezxplot find . -name Makefile -print
This is one that recently saved me when I could not find a program that I knew
I had written, but I did not know its name or which directory it was in. I knew,
however, that the program was compiled with a Makefile that would contain ezxplot
because this is necessary to make the program work. The grep command searches
for ezxplot in a list of file names returned by the find command. Note that the find
command must be enclosed with backward apostrophes, not regular apostrophes.
ps -eaf
This lists all processes that are running on your machine
ps -eaf | grep shearer
This lists all of the processes that contain the string shearer in the output lines
of ps. In this case, the vertical line is a pipe that directs the output of the first
command, ps, into the input of the second command, grep.
ps -eaf | grep shearer > junk
As above, but writes the output into file junk rather than to the screen.
kill PID
This kills a job with process ID number PID (obtained from the tops or ps command). This is useful for runaway jobs. For stubborn jobs, use the -9 option:
kill -9 PID
You can compare two files using the diff command:
diff file1 file2
This lists all differences between file1 and file2. This is useful if you have made some
changes to a file but cannot remember exactly what they are.

2.9. EXAMPLE OF UNIX SCRIPT TO PROCESS DATA

2.8.1

31

Some sed and awk examples

cat file1 | sed s/Peter/Paul/ > file2


Copy file1 to file2, substituting "Paul" for the first "Peter"
on each line.
cat file1 | sed s/Peter/Paul/g > file2
Copy file1 to file2, substituting "Paul" for the every "Peter"
on each line (note the "g" flag for global substitution).
cat file1 | sed s/^/Paul says / > file2
Insert the prefix "Paul says " at the beginning of each line.
Note that "^" means start of line
cat file1 | awk {print $5,$3,$1} > file2
Assuming file1 has 5 columns of figures, this copies columns
5, 3, and 1 to file2, in that order, omitting the 2nd and
4th columns.
cat file1 | awk {print $2, $1*(-10)} > file2
Switch columns 1 and 2 and multiply original 1st column
numbers by -10.
cat file1 | awk {print substr($0,1,10) substr($0,31,10) substr($0,21,10)
substr($0,11,10) substr($0,41,length($0)-40)} > file2
This copies file1 to file2, swapping the contents of columns 11-20
and columns 31-40. $0 is the line, substr takes a chunk of it, the
last part prints the remainder of the line.
cat file1 | awk {print "Columns 11 to 20 are " substr($0,11,20)} > file2
Start the beginning of each line with "Columns 11 to 20 are " and then
list the contents of those columns in the input file.
Variations on the last few examples can be used to reformat ascii data files,
including those that do not have spaces between fields.

2.9

Example of UNIX script to process data

By combining UNIX commands into a script, it is possible to create very powerful


tools for processing data. Consider a simple example where we have a number of
data files contained in a data directory:
rock> cd data.dir
rock> ls
data1 data2 data3
We have written a program to process the data in these files and write new data
files which we might want to call data1.proc, etc. The program is called procdata

32

CHAPTER 2. UNIX INTRODUCTION

and prompts the user for an input file name and an output file name, and, in this
simple example, a multiplier factor to scale the data. If the program is one level up
from the data directory, then:
rock> ../procdata
Enter input file name
data1
Enter output file name
data1.proc
Enter multiplier factor
3
rock>
To process all of the files in the directory, we could run the program for each
file, manually entering the file names. However, clearly this would get very tedious
if we had lots of files to process. We could modify the program to accept a list of
file names, but perhaps it is a complicated program that someone else wrote that
we dont want to mess with. Another approach is to write a UNIX script to run the
program for all of the files in the directory. Here is one way to do this, using the
command file do.proc, which looks like this:
#! /bin/csh
\rm procdata.log
\rm data.dir/*.proc
ls data.dir > filelist
cd data.dir
# Note the "backwards" apostrophes in next line, regular ones wont work!
foreach filename (cat ../filelist)
echo "processing file:" $filename
../procdata >>! ../procdata.log << !
$filename
$filename.proc
2
!
end
cd ..
\rm filelist

2.9. EXAMPLE OF UNIX SCRIPT TO PROCESS DATA

33

This is designed to be located in the same directory as the program, one level
up from the data directory. The screen output from the program (all of the Enter
input file name, etc., lines) are directed to a file called procdata.log so the first
thing we do is remove any old version of this file, if it exists. Note that within the
script, we use backslash rm instead of rm so that any aliases that might require
interactive verification of the deletions are not performed. Otherwise the computer
might prompt us to see if we really want to delete the files and the script would not
be prepared to handle this.
Next, we remove any existing processed files in the data.dir directory, using a
wildcard and assuming that the file names end in .proc
Next, we write a list of the data file names within data.dir into the file filelist
Then we go into data.dir where we loop over the filenames contained in filelist,
using the foreach command. This loop is terminated by the end command later
in the script. The
cat ../filelist
(be sure to use backward apostrophes!) will return one line of filelist at a time and
assign it to the filename variable. The backward apostrophes indicate that a UNIX
command is to be executed. We use the echo command to output to the screen each
file that is being processed.
We then run the procdata program (one level up so we need the ../) and direct
the normal screen output to the logfile. We use >> ! rather than >> in case
set noclobber is contained in our .cshrc file (set noclobber prevents overwriting an
existing file with > or writing to a nonexistent file with >>, the latter case being
our situation. Note that >! also overrides the noclobber setting).
Within the foreach loop we refer to the contents of the filename variable as
$filename. The procdata program is terminated within the script with the ! symbol.
Following the end statement that completes the loop over the files, we go back
to the directory containing the program and delete filelist, as it is no longer needed.
The power of this script is that it can be run on a directory containing thousands
of files, just as easily as for a smaller number of files.
In this example, we really did not need to generate the file filelist because we
eventually deleted it. Thus, we could have written the script as:
#! /bin/csh

34

CHAPTER 2. UNIX INTRODUCTION

\rm procdata.log
cd data.dir
\rm *.proc
Note the backwards apostrophes in the next line, regular ones wont work!
foreach filename (ls)
echo "processing file:" $filename
../procdata >>! ../procdata.log << !
$filename
$filename.proc
2
!
end
cd ..
Alternatively, the ls could be written as *, i.e.,
foreach filename (*)
will also work. In this case the wildcard * will assume the name of all of the files
within the current directory.
We wont have time in this class to go into the details of all of the different
things one can do in scripts like this. There are lots of books on UNIX that one can
consult for this purpose (but who has time to read them?), but most of us just pick
up stuff as we need it. The main point that I want to get across is that if you are
spending lots of time running programs manually, then you are wasting your time.
Spend some of that time learning how to write a UNIX script and you will be far
better off in the long run. Your work will be better documented and it will be much
easier for you (and others) to reproduce your work.

2.10

Common UNIX command summary

cat filename
cd
cd
cd
cd
cd

dirname
..
~/dirname
~otheruser

print filename on your screen


change directory to dirname
go back up one level
go to home directory
go to directory dirname in home directory
go to home directory of otheruser

2.10. COMMON UNIX COMMAND SUMMARY


cp file1 file2
cp -i f1 f2

copy file1 to file2


copy f1 to f2 but ask before overwriting f2

df
du -ks *

list disk space on the different disks


list disk usage for files/dirs in current directory

lpr -P silo filename


ls
ls
ls
ls
ls

-l
-a
-F
*.f

list
list
list
flag
list

35

send filename to printer silo


files in directory
all files including those starting with .
directories by adding slash to their name
all files with ending with ".f"

mkdir dirname

make directory dirname

mv file1 file2
mv -i f1 f2
mv *.f src

change name of file1 to file2


change name of f1 to f2 but ask before overwriting existing f2
move all files ending in .f into existing directory src

pwd

print working directory

rm filename
rmdir dirname

remove filename
remove directory dirname (directory must be empty)

wc filename
wc -l filename

count words in file


count lines in file

36

CHAPTER 2. UNIX INTRODUCTION

Chapter 3

Generic Mapping Tools


Generic Mapping Tools, frequently referred to as GMT, is a set of UNIX tools for
making map and plots written by Paul Wessel and Walter Smith. This software is in
the public domain and is still maintained by Wessel and Smith. The latest update
(Version 4.5.7) was released on July 15, 2011.
If you havent yet installed GMT, Im not going to lie, it is a pain. The best
way is to use a utility called Fink, which itself must be installed and is also a
pain. The good news is that once you have installed Fink, there are a LOT of other
programs available to you for free and keeping everything up to date is pretty easy.
In the following are some instructions to help you through the installation process.

3.1

Installation of Fink, GMT other useful tools

Download the Fink source code for your operating system via this link:
http://www.finkproject.org/download/srcdist.php
Double click on the tar file that you just downloaded if it hasnt been unpacked
automatically. Open a terminal window and change directories into this folder (in
the test case I just did, /Downloads/fink-0.31.3).
%cd ~/Downloads/fink-0.31.3
Start the installation operation with the supplied bootstrap script. Note that
with all these scripts, just select the default options by hitting the return key.
% ./bootstrap
Now do the following:
37

38

CHAPTER 3. GENERIC MAPPING TOOLS

% /sw/bin/pathsetup.sh # this sets up your path (if it wasnt already)


% fink selfupdate-rsync # this updates your fink distribution
% fink index -f # this makes an index
% fink scanpackages # this figures out what packages are available.
If you want a complete list of packages, try fink list. We are after GMT now, so
install it and some other related packages using Fink:
% fink install gmt
% fink install gmt-coast
% fink install gmt-doc
The high resolution GMT coastlines may not be the most up to date in the most
recent version. If desired, go to ftp://ftp.soest.hawaii.edu/gmt and download the
following high resolution coastline archives:
GSHHS2.0.2_coast.tar.bz2
GSHHS2.0.2_full.tar.bz2
GSHHS2.0.2_high.tar.bz2
In the Finder go to the download folder and double click on each of the three
archives. They will be unpacked as folders. Now in Terminal run the following
commands (please note that the version number of your downloaded packages may
be newer than in the commands below and may need to be modified):
% cd ~/Downloads
% sudo cp GSHHS2.0.2_coast/share/coast/*.cdf /sw/share/gmt/coast
% sudo cp GSHHS2.0.2_high/share/coast/*.cdf /sw/share/gmt/coast
% sudo cp GSHHS2.0.2_full/share/coast/*.cdf /sw/share/gmt/coast
You should also install ps2eps (PostScript to Encapsulated PostScript), gv (Ghostview)
and ImageMagick (which includes the conver program to convert .eps to image files
like .png):
% fink install ps2eps
% fink install gv
% fink install imagemagick

3.1. INSTALLATION OF FINK, GMT OTHER USEFUL TOOLS

39

Now that you have installed GMT, lets take it out for a spin. There is a
GMT website at http://gmt.soest.hawaii.edu/ that contains a GMT tutorial, and
a number of examples. The best strategy for figuring things out is to look through
the examples and find something close to what you want. Use that as a starting
point. Some other sites that may be helpful are:
http://geophysics.eas.gatech.edu/classes/Intro GMT/ (tutorial)
http://www.ruf.rice.edu/ben/gmt.html (useful links to things to plot).
Once you have learned a little bit about how to use GMT, its helpful to use
the help documents as your primary reference. You can access these at the UNIX
command level by simply entering:

man pscoast or even just pscoast. But be

warned, GMT is not very user friendly. If you are unfamiliar with some of the
features of the UNIX environment like piping output, etc., you may find it fairly
intimidating at first. However, it is extremely powerful in the number of things that
it can do and the time spent learning how it works will not be wasted as you will
become familiar with many useful UNIX tools. GMT is capable of producing very
nice maps and plots and is pretty much the industry standard for Earth Science
map making.
We begin with an example using the tool pscoast. You will ALWAYS want to
run GMT as a UNIX script (see Chapter 2). Here is one called do.gmt1:

#!/bin/csh
pscoast -R0/360/-90/90

-JQ180/6i

-B60g30

-P

-Dc

-G200

>! map.ps

ps2eps -f -g -l map.ps
gv map.eps
The first line, #!/bin/csh, invokes the C-shell environment. The script will run
without this, but it is probably safest to always go into the C-shell because all of the
example GMT scripts do this. Presumably in some cases it may make a difference.
Next we encounter pscoast which is the GMT program that draws coastlines.
This program has a lot of options that, in UNIX style, are invoked with a dash and
a letter on the same line (e.g. -P, -Dc, etc.). The output of the program is directed
to the Postscript file map.ps. Note that we use >! instead of > to avoid getting an
error message if map.ps already exists; but beware, you will overwrite any existing
map.ps file.

40

CHAPTER 3. GENERIC MAPPING TOOLS


To get a full list of options, simply type the command name alone. But for now,

let us examine the arguments in our example:


-R0/360/-90/90
This sets the map limits of longitude (lon1=0, lon2=360), and latitude (lat1=90, lat2=90). Note that we could have written this with a space between the R
and the zero (0) immediately following. Most people leave this space out to more
easily separate the different commands.
-JQ180/6i

This specifies the map projection to be cylindrical equidistant (a simple linear


scaling of lat/lon). The center meridian is set to 180 degrees; the plot width is set
to 6 inches (the i means inches). A very large number of different map projections
are available!
-B60g30

This sets the labeled lat/lon lines to 60 degree intervals and the unlabeled lines
to 30 degree intervals
-P

Specifies portrait mode so that the plot is not rotated by 90 degrees as in


landscape mode (default).
-Dc

Sets the resolution of the coastline to c for crude. This is all that is required
for a small map of the whole globe. For larger maps or closeups, higher resolution
will be required. The available options are: (f)ull, (h)igh, (i)ntermediate, (l)ow and
(c)rude. Note that the full resolution files require over 55Mb of data and provide
great detail. It takes more time to generate the plot, too, so generally should only
be used for extreme close ups. The default is (l)ow resolution.

3.1. INSTALLATION OF FINK, GMT OTHER USEFUL TOOLS

41

-G200

Specifies the grayshade level for the continent shading from 0 (black) to 255
(white). These nonintuitive units are a Postscript convention!
Finally, there are two more commands that let us view the plot without any
fancy store bought applications. These are ps2eps and gv. The former converts the
postscript file format to and encapsulated postscript (EPS) format and the second
calls ghostview, a handy quick look program for EPS files.
After running the script do.gmt1, you should get a popup window that looks like
this:

If you open the file in PageView, you will notice that there is a lot of white space
around the plot and it is not centered on the page. The first problem can be fixed
by setting the page size to Letter using the command gmtset PAPER MEDIA
Letter. But, if you generate a plot that is bigger than that, GMT will truncate it. If
you want to be always safe from that problem, set the media type to ArchE, which
is poster size. The second problem is because the default position for the lower
left corner is at x=1inch, y=1inch. We can change this by specifying the x and y
positions directly:
-X1.2i -Y4i
This sets the lower left corner to 1.2 inches from the left edge and 4 inches from
the bottom.

42

CHAPTER 3. GENERIC MAPPING TOOLS


You may also decide that we dont want grid lines drawn on the plot so we

remove the g30 from the -B command:


-B60
This puts a label on the latitudes and longitudes every 60 , but doesnt draw the
grid
You may prefer a smaller font. We can do this with a different program which
we run before calling pscoast:
gmtset ANOT_FONT_SIZE 10
Finally, you may decide we want the continents to be a hideous green color and
can choose the -Gred/green/blue option where red, green and blue are numbers
between 0 and 255. The resulting script (do.gmt2) is:
#!/bin/csh
gmtset PAPER_MEDIA Letter
gmtset ANOT_FONT_SIZE 10
pscoast -R0/360/-90/90
-Dc

-JQ180/6i

-B60

-P -X1.2i -Y4i

-G0/255/0 >! map.ps

ps2eps -f -g -l map.ps
gv map.eps
Note that we used a backslash to indicate that the line continues to the next
line. In this way we can avoid making our lines too long.
This script produces something like this (is the green hideous enough for you?):
0

60

120

180

240

300

60

60

60

60

60

120

180

240

300

3.1. INSTALLATION OF FINK, GMT OTHER USEFUL TOOLS

43

Next, suppose you have a file containing the coordinates of some seismic stations
which we wish to plot on this map. The file is called station.list and its first five
lines are:
9.02920

38.76560 2442 AAE

42.63900

74.49400 1645 AAK

37.93040

58.11890

678 ABKT

51.88370 -176.68440

116 ADK

-13.90930 -171.77730

706 AFI

Here is a script (do.gmt3) that will plot these points on our map:
#!/bin/csh
gmtset ANOT_FONT_SIZE 10
pscoast -R0/360/-90/90

-JQ180/6i

-B60

-P

-Dc

-G200 \

-X1.2i -Y4i -K >! map.ps


psxy -O

-R

-JQ180/6i

-St0.06i

-G0 -: station.list

>> map.ps

ps2eps -f -g -l map.ps
gv map.eps
The first part of the script is the same as before, except that the -K is necessary
to tell GMT to keep the Postscript file open (i.e., dont put a showpage at the
end of the file) so that more can be added to the file.
Next, we use psxy to read and plot the x-y points from file station.list. We use
>> to append the output onto the end of the map.ps file. Note that > or >! would
not work here because it would overwrite the file instead of adding to it.
The arguments of psxy are as follows:
-O
Indicates that this is a overlay onto an existing Postscript file to avoid the initializations at the beginning of the file.
-R

Sets the plot boundaries (defaults to those set by pscoast)

44

CHAPTER 3. GENERIC MAPPING TOOLS

-JQ180/6i

Sets the map projection (according to the manual, this should work without the
180/6i but I did not find this to be true).
-St0.06i

Plots xy points as (t)riangles of 0.06 inch width. Other options are (c)ircle, (d)iamond,
(s)quare, (i)nverted triangle, (x)cross, and (v)ector. In the case of the vector, the
direction and length are also read from the file (see manual).
-G0

Set fill parameter to 0 (black). This will fill in the triangles.


-:

This tells the program that the data are to be read as y-x pairs. The default is x-y, or
(long,lat) in our case. For seismology this option is very useful because coordinates
are usually given as lat, lon rather than the other way around.
Note that the program automatically converts the longitude convention of the data
points (-180 to 180) to the longitude convention of the map (0 to 360). This is a
nice feature of GMT.
0

60

120

180

240

300

60

60

60

60

60

120

180

240

300

Now lets try a different map projection, plot the points in red, and add a title
in a script called do.gmt4:

3.1. INSTALLATION OF FINK, GMT OTHER USEFUL TOOLS

45

#!/bin/csh
gmtset ANOT_FONT_SIZE 10
pscoast -R-180/180/-90/90
-P
psxy -O

-Dc

-R

-JH0/6i

-Bg0:."IRIS FARM stations": \

-G150 -X1.2i -Y4i -K >! map.ps

-JH

-St0.06i

-G255/0/0 -: station.list

>> map.ps

The changed commands are as follows:


-JH0/6i
Invokes the equal-area Hammer projection with 6 inch width
-Bg0:."IRIS FARM stations":
The 0 results in no grid lines or labels The title is set off with :. and : (weird!)
-G255/0/0
Define the fill for the xy plot as red=255, green=0, blue=0
To my taste, the title is way too big. This can be changed using the gmtset
HEADER FONT SIZE command (see below).
Next, lets look at a closeup of these stations in southern California with the
script do.gmt5:
#!/bin/csh
gmtset HEADER_FONT_SIZE 20
gmtset ANOT_FONT_SIZE 10
pscoast -R-121/-114/32/37
-P
psxy -O

-R

-JM6i

-B1g1:."IRIS FARM stations": \

-Di -I1 -I2 -I3 -N1 -N2 -G255/200/200 -X1.2i -Y4i -K >! map.ps
-JM

-St0.06i

-G0 -: station.list

Changes are:
gmtset HEADER_FONT_SIZE 20
Set font size for title to 20
-R-121/-114/32/37
Set lon1,lon2,lat1,lat2 to S. California

>> map.ps

46

CHAPTER 3. GENERIC MAPPING TOOLS

-JM6i

Use Mercator projection, width = 6 inches

-B1g1:."IRIS FARM stations":

Label and draw grid lines every 1 degree Use same title

-I1
Plot permanent major rives

-I2
Plot additional major rivers

-I3
Plot additional rivers

-N1
Plot national boundaries

-N2
Plot state boundaries within the Americas

-G255/200/200
Fill land areas with red=255, green=200, blue=200 - a ghastly shade of pink:

3.1. INSTALLATION OF FINK, GMT OTHER USEFUL TOOLS

47

IRIS FARM stations


239

240

241

242

243

244

245

246

37

37

36

36

35

35

34

34

33

33

32

32
239

240

241

242

243

244

245

246

Next, lets add lines that show the traces of mapped faults in southern California.
For this, we use a file called calif.flts that has the following format:
370.0000 99.0000
-115.5496 32.9312
-115.5419 32.9142
-115.5358 32.9029
-115.5276 32.8890
370.0000 99.0000
-115.9218 32.9916
-115.9096 32.9849
-115.8936 32.9745
-115.8729 32.9655
-115.8398 32.9498
-115.8216 32.9410
-115.7983 32.9295

48

CHAPTER 3. GENERIC MAPPING TOOLS

-115.7796 32.9228
370.0000 99.0000
-115.8391 33.0127
-115.8205 33.0069
-115.8017 33.0015
-115.7892 32.9973
etc.
The faults are defined as (lon,lat) pairs. A value of (370,99) is used to separate
the different faults because 370 is off the map (only goes to 360!). To plot these
faults on our map of the southern California stations, we can use the psxy command
a second time in the script do.gmt6:
#!/bin/csh
gmtset HEADER_FONT_SIZE 20
gmtset ANOT_FONT_SIZE 10
pscoast -R-121/-114/32/37
-P

-JM6i

-B1:."IRIS FARM stations": \

-Di -I1 -I2 -I3 -N1 -N2 -G255/200/200 -X1.2i -Y4i -K >! map.ps

psxy -O

-R

-JM

-M370 -W8/255/0/0 calif.flts -K >> map.ps

psxy -O

-R

-JM

-St0.15i

-G0/0/255 -: station.list >> map.ps

Changes are:
-B1:."IRIS FARM stations":
We removed the g0 so we dont plot grid lines which might get confused with the
faults
We plot the faults using:
psxy -O

-R

-JM

-M370 -W8/255/0/0 calif.flts -K >> map.ps

In this command, we use the -M option to flag the segment boundaries


-M370
lines starting with 370 mark segment boundaries. Without this command the plot
would look like a mess because lines would be drawn to the (370,99) points.
-W8/255/0/0

3.1. INSTALLATION OF FINK, GMT OTHER USEFUL TOOLS

49

This draws the line with linewidth=8 (thicker than normal) and color red=255,
green=0, blue=0
Note that we do not need the -: option because the points are already given as
(lon,lat).
We plot the stations last so that they will go on top of the faults. We change
the symbol size and the color:
-St0.15i
plot triangles 0.15 inches high
-G0/0/255
plot with red=0, green=0, blue=255
Note that -K is needed on every command except the last; -O is needed on every command except the first. (***THIS IS KEY TO GETTING GMT SCRIPTS
TO WORK PROPERLY!!! CHECK THIS FIRST WHEN YOU HAVE PROBLEMS.***)
The script do.gmt6 results in the plot:

IRIS FARM stations


239

240

241

242

243

244

245

246

37

37

36

36

35

35

34

34

33

33

32

32
239

240

241

242

243

244

245

246

(Are there really no faults in eastern southern California?)


ASSIGNMENT GMT1

50

CHAPTER 3. GENERIC MAPPING TOOLS


Install Fink and GMT if you havent already. Write a script that plots your

birthplace as a big (.2in) red star. Use an orthographic projection with the star at
lon0/lat0. Draw all rivers, national and state boundaries. Make a title in 20pt font
with your name. Look up how to annotate the point with text (HINT: pstext and
label the star with the name of the place you were born. Turn in the script and the
.eps file produced by it (via e-mail to ltauxe@ucsd.edu).

Chapter 4

Fortran
Why learn Fortran? The answer for geophysics students at SIO is obviousmost
IGPP faculty program in Fortran. However, let me make a broader pitch for its
usefulness. For many years, Fortran was the language of choice for scientific programming. Recently, C has emerged as a competitor and seems to now be much
more widely taught to students, perhaps because it is favored by computer science
departments (who worry more about writing compilers than how to handle complex
numbers).
Why learn Fortran if you already know C? Several reasons come to mind:
1. You will communicate and exchange software more readily with most IGPP
faculty, who are generally proficient in Fortran but rather challenged in C.
2. There is a huge library of existing Fortran subroutines to perform various
algorithms, both at IGPP and in the wider scientific community.
3. Complex numbers and double precision are included.
4. Its fun!
Finally, some cranky advice: It is important to become familiar with at least
one major programming language. Fortran and C are such languages; Matlab and
other application programs are not. In the long run, you will handicap your ability
to do science if you do not take the time to gain experience with a real programming
language. In the worst case, you will be one of those annoying people that is forever
asking others to write or modify programs for you.
51

52

CHAPTER 4. FORTRAN

4.1

Fortran history

The name Fortran is derived from IBS Mathematical FORmula TRANslation System. Originally it was spelled in all caps as FORTRAN, but the more modern
usage is to only capitalize the first letter. Fortran is one of the oldest computer
programming languages and was begun in 1954 by IBM programmers led by John
Backus (no relation to George). It is updated and improved by some committee
of computer sciences every ten years or so. Important versions include:
Fortran IV (released in 1972)
Fortran 77 (released in 1980), major revision
Fortran 90 (released in 1991), major revision
Fortran 95 (released in ?), minor revision
Wikipedia also lists Fortran 2003 and 2008, which I dont know anything about. I
would not recommend using these at this point in time. Most versions are designed to
be fully backward compatible with previous versions. However, Fortran90 departed
from the column sensitive format of older Fortran, resulting in a more modern
approach that necessitated some slight incompatibilities with older Fortran. I will
try to teach this class entirely in Fortran90, but will describe the differences with
Fortran77 when they are important because you are likely to need to work with
existing Fortran77 code at some point.

4.2

Texts and manuals

This class will be a tutorial on Fortran90 and will not be comprehensive. Thus, I
recommend that you buy a textbook that will be serve as a more complete reference.
I just checked on Amazon and there are 4 or 5 books out there. One that I have
used and recommend is: Fortran 90/95 Explained (Metcalf and Ker Reid, Oxford
Univ. Press, 1999), which has some favorable reviews. Its only $47. There is an
updated version that I have not seen for $53. You may want to check around to see
which book you like best.
In class, we will use gfortran, which can be downloaded free for the Macs. To
make sure you have it in your path, enter: which gfortran and you should get
something like /usr/local/gfortran/bin/gfortran
I recommend that you add the following to your .cshrc file:
alias f90 gfortran

4.3. COMPILING AND RUNNING F90 PROGRAMS

53

This will allow you to just type f90 instead of gfortran when you want to compile
programs. The examples that follow assume that you have done this.

4.3

Compiling and running F90 programs

Fortran90 programs are written as ascii files that end in .f90. This is called the
source code. Programs in older versions of Fortran end in simply .f. Because of the
slight incompatibilities between the versions, be sure to use the appropriate suffix
so that both you and the compiler know which version to use. It is in fact possible
to set things up so that your F90 programs end in .f rather than .f90. I do not
recommend this because there is so much existing code around in Fortran77 that
confusion is likely to result if you do not explicitly identify your code as F90.
Before the program can be run, it must be compiled using the Fortran90 compiler on your computer, creating the executable file that you use to actually run
the program. For example, suppose you have a program called printmess.f90 with
contents:
! simple Fortran90 test program (printmess.f90)
program printmess
print *, "test message"
end program printmess
To compile this program from my computer (rock in this case), I enter f90
printmess.f90 -o printmess and get the following response:
rock% f90 printmess.f90 -o printmess
rock%
No news is good news! If there was a syntax or other error detected during the
compilation, we would get an error message at this point. This creates an executable
version of the program called printmess (you should ALWAYS use name of the
program without the .f90 for the executable) which can then be run simply by
entering ./printmess:
rock% ./printmess
test message
rock%
Note the because . is not in our path, we need to type ./printmess and not just
printmess.

54

CHAPTER 4. FORTRAN
If you are like me, you will quickly tire of typing in the line f90 printmess.f90 -o

printmess Thus, I recommend that you create a Makefile (yes, the first character
MUST be capitalized) in the same directory as the program. Include in the Makefile
the following lines:
%:

%.f90
gfortran $< -o $*
The tricky part of this is that the space before the gfortran MUST be entered

as a tab, not as a series of spaces.


Once you have set up this Makefile, then you can just enter:
make printmess
in order to compile the program. Makefiles are very useful to keep track of compiler
options and to bind with subroutines. (more about this later).
The difference between source code and excutables is fundamental to languages
like FORTRAN and C. It is why they generally run much faster than uncompiled
languages like BASIC or Matlab or Python scripts.

4.3.1

The first program explained

OK, now lets examine our simple F90 test program again:
! simple Fortran90 test program (printmess.f90)
program printmess
print *, "test message"
end program printmess
The first line is a comment line. Anything following an exclamation mark (!) is
a comment. The end of the line terminates the comment. There is no need (as in
many languages) to terminate the comment with another flag. We can also use ! to
add an inline comment following a statement, e.g.,
print *, "test message"

!print message on screen

would be a valid line. The next line is blank. You can put in as many blank lines as
you want and they will be ignored by the compiler. Blank lines provide a good way
to improve the readability of longer programs by breaking them up into coherent
blocks of code.

4.3. COMPILING AND RUNNING F90 PROGRAMS

55

The next line is used to name the program. The name printmess is not used by
the program at all. This line is optional but is considered good programming style.
Good style also suggests that lines in the body of the program are indented, in this
case by three spaces:
print *, "test message"
print * will output to the screen. The desired output is enclosed in quotes (apostrophes would also work). Finally, all F90 programs must terminate with an end
statement. The program printmess is optional, but is considered good programming practice. These style points dont matter much for a short program like this,
which would work just as well if were written as:
print *, "test message"
end
but are helpful for longer programs (hundreds to thousands of lines of code) where
the label following the end would remind a reader of which program is actually
ending.
ASSIGNMENT F1
Write a F90 program to print your favorite pithy phrase.

4.3.2

How to multiply two integers

Here is a simple F90 program to multiply 2 and 3:


program multint
integer :: a, b, c
!declare variables
a = 2
b = 3
c = a * b
print *, "Product = ", c
end program multint
The program uses three variables, the letters a, b and c. Variable names can
be from 1 to 31 characters long. The first character must be a letter. The remaining characters can be any combination of letters, numbers, and underscores ( ).
Variables in Fortran can be of many types, including real (floating point), integer,
complex, double precision, character and logical. In our case, we want them to be
integers so we define them using a type statement

56

CHAPTER 4. FORTRAN
integer :: a, b, c

!declare variables

Older Fortran programs do not include the :: in these statements; this convention
still works under F90 but is discouraged. For the Sun F90 compiler, these are 4byte integers that can range from -2,147,483,648 to 2,147,483,647. Alternatively,
they could have been defined as real numbers:
real :: a, b, c
in which case they would be floating point (real) variables. For the Sun compiler,
real numbers range from 1.175494e-38 to 3.402823e+38)
The lines that follow are pretty self explanatory:
a = 2
b = 3
c = a * b
print *, "Product = ", c
Note that * is used to indicate multiplication as is true in almost all programming
languages. Addition is +, subtraction is -, and division is /. In Fortran, a to the
power of b is written as a**b where a and b can both be real, another Fortran
advantage over standard C.

4.4

An important historical digression

It is not required that variables be declared in Fortran. If they are not declared,
then the compiler assigns them as integer or real, depending upon their first letter.
Variable names beginning with i through n (INteger, get it?) are assumed to be
integers; all others are assumed to be real. Many, if not most, older Fortran programs
adopt this convention. Often they only declare variables when the rule is broken,
for example when it is desired that the variable year be an integer. A prominent
example of this type of Fortran programming may be found in the first edition (1986)
of Numerical Recipes (but by the time of the second edition in 1992, the authors
had been shamed into declaring all variables).
Such undeclared variables are said to be declared implicitly based upon their
first letter. An implicit statement can be used to modify the default rules, for
example
implicit real (a-z)

4.4. AN IMPORTANT HISTORICAL DIGRESSION

57

will make all undeclared variables real. However, modifications such as this will only
lead to more confusing code. Modern programming practice is to explicitly declare
ALL variables and to have the compiler warn us when variables are not declared.
This can be done by including the statement
implicit none
at the beginning of every Fortran program. I will admit that, as a long time Fortran
programmer, I do not always follow this practice. I must concede, however, that it is
a good idea and is likely to save more time in the long run (by eliminating program
bugs that are often caused by undeclared variables) than is lost while writing the
program.
Example 1: Suppose you should have the following line in your program:
x2 = a1 + a2*sin(theta)*scale1/scale2
but you accidentally type:
x2 = a1 + a2*sin(theta)*scalc1/scale2
If you dont follow the practice of declaring all of your variables, then the error
will not be detected during compilation. The program will run and assign zero to
the otherwise unused variable scalc1 and you will get wrong answers. On the other
hand, if you use implicit none and declare your variables, the compiler will flag
scalc1 as an undeclared variable and you can fix it before it causes any more trouble.
Example 2: Suppose you have the following lines in your code:
kmdeg = 111.19
dist = (delta2 - delta1)*kmdeg
but you have not declared kmdeg explicitly. Because the letter k is between i and n,
the program will declare kmdeg as an integer. The value 111.19 will be truncated
to 111 and you will get values for dist that are slightly, but not obviously, wrong
(the worst kind of program bug to have).
To save you from these embarrassments and to encourage good programming
habits, we will declare all variables for the programs in this class and I will dock
you points if you fail to do so in assignments.

58

CHAPTER 4. FORTRAN

ASSIGNMENT F2
Copy the program multint but leave out the integer :: a, b, c statement. What
happens when you run the program? Why?
ASSIGNMENT F3
Cut and paste the following defective program onto your computer:
program longjump
beamon_long = 8.90
!distance in meters
powell_long = 8.95
dif_inch = (powell_long - beamon_1ong) * 39.37
print *,"Powell jumped ",dif_inch," inches more than Beamon"
end program longjump
Compile and run the program. Then explain why it gives the wrong answer.

4.4.1

Alternate coding options

There are always lots of ways to write the same program. Here is another way to
write the multint.f90 code:
program multint2
implicit none
integer :: a=2, b=3, c
!declare variables
c = a * b; print *, "Product = ", c
end program multint2
First, notice that variables can be assigned values when they are declared (OK
in F90, dont try this in F77). Second, notice that more than one command can be
included on a line if the commands are separated by a semicolon (again, only OK in
F90). Both of these changes make the code more similar to C. The variable assigning
option is a reasonable convenience, but the multiple command option is certainly
misused in this case because it makes the code much harder to read. Unless you
have a really good reason to put more than one command on a line (saving space is
NOT a good reason!), I suggest that you never use semicolons in this way.

4.5

Making a formatted trig table using a do loop

Any reasonable programming language must provide a way to loop over a series
of values for a variable. In C, this is most naturally implemented with the for
statement. In FORTRAN this is done with the do loop.
Here is an example program that generates a table of trig functions:

4.5. MAKING A FORMATTED TRIG TABLE USING A DO LOOP

59

program trigtable
implicit none
integer :: itheta
real :: theta, stheta, ctheta, ttheta, degrad
degrad = 180./3.1415927
do itheta = 0, 89, 1
theta = real(theta)
ctheta = cos(theta/degrad)
stheta = sin(theta/degrad)
ttheta = tan(theta/degrad)
print "(f5.1,1x,f6.4,1x,f6.4,1x,f7.4)", theta, ctheta, stheta, ttheta
enddo
end program trigtable
Fortran (like C and Matlab) uses radians (not degrees) as the arguments of the
trig functions. Thus, following the definitions of the variables as real, we assign
degrad to 180/pi so that we can easily make this conversion.
do itheta = 0, 89, 1
This begins the do loop which must eventually be closed with the enddo statement. For clarity, we indent the inside of the loop. This is a loop over values of theta
from a starting value of 0, incremented by 1 each time, until theta is greater than
89. The 1 is actually optional as it is the default increment. Thus theta will assume
the values (0, 1, 2, ...., 88, 89) inside the loop. Notice that we use an integer for the
do loop. Older versions of Fortran (e.g., F77) permitted the use of real variables
in do loops. This is not recommended and roundoff peculiarities mean that the do
loop would need to be written in the form:
do theta = 0.0, 89.1, 1.0

!****WONT WORK IN F95!

Note that we use 89.1 rather than 89.0 as the ending value to avoid the possibility
that roundoff error might cause the desired ending value (computed by successively
adding 1.0 to theta) to slightly exceed 89 and thus be excluded from the loop. I
have a lot of old code of this form, but Im going to slowly try to get rid of it.
Inside the do loop we first convert from the integer theta to the real theta using:
theta = real(theta)
We then compute the cosine, sine and tangent of theta, after converting from
degrees to radians by dividing by degrad (anybody see how we could make the
program slightly more efficient?). To make the output look nice, we do not use

60

CHAPTER 4. FORTRAN
print *, theta, ctheta, stheta, ttheta

which would space the numbers irregularly among the columns. Instead, we explicitly specify the output format using a format specification:
print "(f5.1,1x,f6.4,1x,f6.4,1x,f7.4)", theta, ctheta, stheta, ttheta
where f5.1 specifies that theta will be output as a real number into 5 total spaces,
with 1 digit to the right of the decimal place. Similarly, f6.4 specifies that ctheta is
output into spaces with 4 digits to the right of the decimal place. The numbers will
be right justified, with leading blanks used as necessary. The 1x specifies that one
blank character will be output between each of the numbers.
An alternative way to write this:
print "( f5.1,
f7.4,
f7.4,
f8.4)", &
theta, ctheta, stheta, ttheta
Here we have used the continuation character & to split the statement into two
lines. Blanks are ignored so we can space things out neatly to line up the variables
with their formats. Notice that we have removed the need for the 1x between formats
by adding an additional column to the appropriate formats (i.e., writing f7.4 rather
than f6.4). Because the numbers are right justified, this will add an additional space
to the left of each number. Aligning things this neatly is probably more trouble than
its worth, but it certainly makes the code easier to understand.
Notice that in this case the f7.4 appears twice in a row. Often programmers will
write this more compactly as:
print "(f5.1, 2f7.4, f8.4)", theta, ctheta, stheta, ttheta
since Fortran allows this syntax.
Older Fortran programs usually put the format specifier into a separate numbered line, i.e.

117

print 117, theta, ctheta, stheta, ttheta


format (f5.1, 2f7.4, f8.4)

This convention is still allowed in F90 but should not be used unless you want to
use the format statement more than once, e.g.,

4.5. MAKING A FORMATTED TRIG TABLE USING A DO LOOP

117

61

print 117, theta1, ctheta1, stheta1, ttheta1


print 117, theta2, ctheta2, stheta2, ttheta2
format (f5.1, 2f7.4, f8.4)

117 is termed a line label and must consist of digits. It is best not to refer to it as
a line number (the old usage) because it normally has nothing to do with the line
numbers and the labels need not be sequential (more about this later). Note that
the format line need not immediately follow the print line(s); it can come before.
Many older programs put all of the format statements at the end of the code.
An alternative way in F90 to use the same format specification more than once
is to define it as a character variable, i.e.,
character (len = 30) :: fmt
fmt = "(f5.1, 2f7.4, f8.4)"
print fmt, theta1, ctheta1, stheta1, ttheta1
print fmt, theta2, ctheta2, stheta2, ttheta2
but we are getting ahead of ourselves because we have not yet shown how to use
character variables.

4.5.1

Fortran mathematical functions

We used the Fortran sine, cosine and tangent function in the trigtable program.
Here is a complete list of math functions:
acos(x)
asin(x)
atan(x)
atan2(y,x)
cos(x)
cosh(x)
exp(x)
log(x)
log10(x)
sin(x)
sinh(x)
sqrt(x)
tan(x)
tanh(x)

4.5.2

arccosine
arcsine
arctangent
arctangent of y/x in correct quadrant (***very useful!)
cosine
hyperbolic cosine
exponential
natural logarithm
base 10 log
sine
hyperbolic sine
square root
tangent
hyperbolic tangent

Possible integer vs. real problems

As an aside, note that the trigtable program uses:

62

CHAPTER 4. FORTRAN
degrad = 180./3.1415927
rather than simply
degrad = 180/3.1415927
The reason is to make completely sure that the program will compute a real

quotient and not an integer quotient. In fact, this caution is not needed in this case,
as the following program demonstrates:
program testfrac
implicit none
real c
c = 2/3
print *,2/3 = , c
c = 2/3.
print *,2/3. = , c
c = 2./3
print *,2./3 = , c
c = 2./3.
print *,2./3. = , c
end program testfrac
Running the program yields:
2/3 = 0.0E+0
2/3. = 0.6666667
2./3 = 0.6666667
2./3. = 0.6666667
As long as one part of the fraction is real, the program will compute a real
quotient. It is only when both numbers are written as integers that the result is
truncated. However, I have gotten into the habit of always including the decimal
point in real expressions to avoid someday accidentally writing something like:
a = sin(phi/degrad) + (2/3) * cos(theta/degrad)**2
which will definitely produce the wrong answer!

4.5.3

More about formats

There are many different formats that can specified. Here are some common examples:

4.6. INPUT USING THE KEYBOARD


i5
=
i5.4 =
f8.3 =
e12.4 =

63

integer, 5 spaces, right justified


as above but pad with zeros to 4 spaces (88 becomes 0088)
real, 8 spaces, 3 to right of decimal place
real output with exponent, 4 places to right of decimal
e.g., b-0.2342E+02 where "b" is blank
(useful for big or small numbers or when you are not
sure what size they will be and want to be sure you
have enough room to output them)
(cranky aside: why always start with "0." which wastes
a space? -2.342E+01 would be more compact)

If a number does not fit into the allocated spaces, it will appear as a series of
asterisks (*****)
a8

2x
tn
tln
trn
/

=
=
=
=
=

character output, 8 spaces, right justified


(if the length of the string is greater than 8,
then the leftmost 8 characters will appear)
output two blanks
tab to position n
tab left n spaces
tab right n spaces (tr2 is the same as 2x)
start a new line

ASSIGNMENT F4
Write a F90 program to print a table of x, sinh(x), and cosh(x) (the hyperbolic
sine and cosine, these are built-in functions in Fortran) for values of x ranging from
0.0 to 6.0 at increments of 0.5 (use these x values directly in sinh(x) and cosh(x), do
not convert them to radians). Use a suitable format to make nicely aligned columns
of numbers.

4.6

Input using the keyboard

So far all of our example programs have run without prompting the user for any information. To expand our abilities, lets learn how to input data from the keyboard.
In most programs, we will want to first prompt the user to input the data, so here
is an example of how to input two numbers:
print *, "Enter two integers"
read *, a, b
Pretty easy, isnt it? (Compare this section with the corresponding input section
in the C or Python notes and you will see why I think Fortran is more user friendly
than C or Python)

64

CHAPTER 4. FORTRAN
Here is an example of a complete program that multiplies two numbers:

program usermult
implicit none
integer :: a, b, c
print *, "Enter two integers"
read *, a, b
c = a * b
print *, "Product = ", c
end program usermult
Running this program, we have:
rock% usermult
Enter two integers
2 5
Product = 10
The program will also accept the input on two lines:
rock% usermult
Enter two integers
2
5
Product = 10
Often in cases like this I will forget that Im supposed to input more than one
number. When the program just sits there, I then realize that I need to input more
numbers (or I wonder why its taking so long to finish!).
What happens if we make a mistake and try entering a real number? Lets check:
rock% usermult
Enter two integers
3.1 15
****** FORTRAN RUN-TIME SYSTEM ******
Error 1083: unexpected character in integer value
Location: the READ statement at line 4 of "usermult.f90"
Unit: *
File: standard input
Input: 3.1
^
Abort
rock%
This is what happens on my old Sun computer, but most Fortran compilers will
produce a similar message. The error message in this case is quite informative and

4.7. IF STATEMENTS

65

tells us exactly what the problem is. This is better than performing the computation
and returning the wrong answer (the default in C for this example unless you are
quite careful).

4.7

If statements

Next, lets modify this program so that it will allow the user to continue entering
numbers until he/she wants to stop:
program usermult2
implicit none
integer :: a, b, c
do
print *, "Enter two integers (zeros to stop)"
read *, a, b
if (a == 0 .and. b == 0) exit
c = a * b
print *, "Product = ", c
enddo
end program usermult2
Here the do loop has no arguments and the block of code inside the do loop will
be repeatedly executed until an exit command is executed. Exit (a new feature
in F90) means to leave the do loop entirely and go to the next line after the enddo
statement. As in our previous example, we indent the block inside the do loop to
make the code easier to read.
The program will allow the user to continuing entering numbers to be multiplied.
When the user wishes to stop the program (in a more elegant way than hitting
CNTRL-C), he/she enters zeros for both arguments. The if statement checks for
this and exits the do loop in this case:
if (a == 0 .and. b == 0) exit
A list of the relational (comparison) operators in different languages is as follows:
FORTRAN
77
90
.eq.
.ne.
.lt.
.le.
.gt.

==
/=
<
<=
>

PYTHON

MATLAB

==
!=
<
<=
>

==
!=
<
<=
>

==
~=
<
<=
>

meaning
equals
does not equal
less than
less than or equal to
greater than

66

CHAPTER 4. FORTRAN

.ge.
.and.
.or.
.not.

>=
.and.
.or.
.not.

>=
&&
||
!

>=
and
or
not

>=
&
|
~

greater than or equal to


and
or
not

The F77 syntax will still work under F90 and you are likely to see this in many
of the older programs, e.g.,
if (a .eq. 0 .and. b .eq. 0) exit
will also work. These operators can be combined to make complex tests, e.g.,
if ( (a > b .and. c <= 0) .or. d == 0) z = a
There is, of course, an order of operations for these things which I cant remember
very well. Look it up in a book if you are unsure or, better, just put in enough
parenthesis to make it completely clear to anyone reading your code.
One nice aspect of Fortran compared to C is that if you make a mistake and
type, for example,
if (a = 0) exit
you will get an error message during compilation. In C this is a valid statement
with a completely different meaning than is intended!

4.7.1

If, then, else constructs

In the above example, a single statement is executed when the if condition is true.
A more versatile form is as follows:
if (logical expression) then
(block of code)
else if (logical expression) then
(block of code)
else if (logical expression) then
(block of code)
.
.
else
(block of code)
end if

4.7. IF STATEMENTS

67

The blocks of code can contain many lines if desired. As many else if statements
as required can be used. At most, one block of code will be executed (once one of the
if tests is satisfied, it does not check the others). The final else will be executed
if none of the preceding if statements is true. The final else is optional.
Here is a demonstration program that repeatedly prompt the user for a positive
real number. If it is negative, ask the user to try again. If it is positive, it computes
and displays the square root using the sqrt() function. If the user enters zero, the
program stops.
program usersqrt
implicit none
real :: a, b
do
print *, Enter positive real number (0 to stop)
read *, a
if (a < 0) then
print *,This number is negative!
cycle
else if (a == 0) then
exit
else
b = sqrt(a)
print *, sqrt = , b
end if
end do
end program usersqrt
Notice the use of the cycle command (also new in F90), which directs the
program to the next iteration of the do loop. In contrast, the exit command exits
the do loop entirely. The cycle and exit commands permit code to be written that
is free of the go to statements that would likely have been present in a F77 version
of this program (go to statements are considered mortal sins by the programming
style police).
ASSIGNMENT F5
Write a program to repeatedly ask the user for the constants a, b, and c in
the quadratic equation a*x**2+b*x+c=0. Using the quadratic formula, have the
program identify and compute any real roots. Output the number of real roots and
their values. Stop the program if the users enters zeros for all three values. HINT:
Test your program for some simple examples to make sure it is working correctly
(a=1, b=2, c=-3 should return -3 and 1).

68

CHAPTER 4. FORTRAN

4.8

Greatest common factor example

Here is an example program that uses some of the concepts that we have just learned:
! compute the greatest common factor of two integers
program gcf
implicit none
integer :: a, b, i, imax
do
print *, Enter two integers (zeros to stop)
read *, a, b
if (a == 0 .and. b == 0) then
exit
else
do i = 1, min(a, b)
if (mod(a, i) == 0 .and. mod(b, i) == 0) imax=i
end do
print *, "Greatest common factor = ", imax
end if
enddo
end program gcf
This is not a particularly efficient algorithm, but it runs plenty fast for small
numbers. There are some new things here:
1. min(a,b) computes the minimum of a and b using the intrinsic Fortran min
function. Naturally there is also a max function. min and max can have
more than 2 arguments if desired.
2. mod(a,i) computes the remainder of a divided by i (this is called the modulus).
If mod(a,i) is zero, then a is evenly divisible by i. If non-zero, then mod(a,i)
has the sign of a.
Here are some more useful Fortran functions:
abs(a)
sign(a, b)

absolute value
abs(a) with sign of b

real(a)
int(a)
nint(a)
ceiling(a)
floor(a)

conversion to real (F77 float(a) still works)


conversion to integer
nearest integer
least integer greater than or equal to number
greatest integer less than or equal to number

ASSIGNMENT F6
Modify gfc.f90 to compute the least common multiple of two integers.

4.9. USER DEFINED FUNCTIONS AND SUBROUTINES

4.9

69

User defined functions and subroutines

As the length and complexity of a computer program grows, it is a good strategy


to break the problem down into smaller pieces by defining functions or subroutines
to perform smaller tasks. This provides several advantages:
1. You can test these pieces individually to see if they work before trying to get
the complete program to work.
2. Your code is more modular and easier to understand.
3. It is easier to use parts of the program in a different program.
To illustrate how to define your own function, here again is the greatest common
factor program:
program gcf2
implicit none
integer :: a, b, gcf
integer, external :: getgcf
do
print *, Enter two integers (zeros to stop)
read *, a, b
if (a == 0 .and. b == 0) then
exit
else
gcf = getgcf(a,b)
print *, "Greatest common factor = ", gcf
end if
enddo
end program gcf2
integer function getgcf(x, y)
implicit none
integer :: x, y, i, z
do i = 1, min(x,y)
if (mod(x,i) == 0 .and. mod(y,i) == 0) getgcf = i
end do
end function getgcf
We now perform the gcf calculation in the function getgcf. The variables a and
b in the main program are the function arguments. They are passed to the function
in the statement:
gcf = getgcf(a, b)

70

CHAPTER 4. FORTRAN
The function getgcf definition begins with:

integer function getgcf(x, y)


The variables x and y in the function will assume the values passed to the
function by the main program. These arguments must match in number and type
(real vs. integer, etc.) with the variables in the main program. Note, however, that
they do not need to have the same names.
Notice that we must declare getgcf in the calling program:

integer, external :: getgcf


This is the preferred F90 syntax, although the following will also work:

integer :: getgcf
The name of the function is by default the value that will be passed back to
the main program. In some cases involving recursive functions (a more complicated
topic that we may cover later), it may desirable to have the value returned to the
main program be specified by a different variable name. To do this, we can use the
optional result specifier in the subroutine name:
function getgcf(x, y) result(z)
implicit none
integer :: x, y, i, z
do i = 1, min(x, y)
if (mod(x, i) == 0 .and. mod(y, i) == 0) z = i
end do
end function getgcf
The result(z) indicates that the result will be passed back to the calling program as
the variable z. Thus, gcf in the calling program will assume the value of z in the
function.
ASSIGNMENT F7
Modify your program from F6 to compute the least-common multiple as a userdefined function.

4.9. USER DEFINED FUNCTIONS AND SUBROUTINES

4.9.1

71

Subroutines

Functions are limited in their usefulness because they are designed to pass only
one value back to the calling program. A more general construct is the Fortran
subroutine, which allows unlimited numbers of values to be passed to and from the
calling program.
Here is our first geophysically useful example, a subroutine to compute the distance and azimuth between any two points on the Earths surface:
program userdist
implicit none
real lat1, lon1, lat2, lon2, del, azi
do
print *, Enter 1st point lat, lon
read *, lat1, lon1
print *, Enter 2nd point lat, lon
read *, lat2, lon2
call SPH_AZI(lat1, lon1, lat2, lon2, del, azi)
print *,del, azi = , del, azi
end do
end program userdist
! SPH_AZI computes distance and azimuth between two points on sphere
!
! Inputs: flat1 = latitude of first point (degrees)
!
flon2 = longitude of first point (degrees)
!
flat2 = latitude of second point (degrees)
!
flon2 = longitude of second point (degrees)
! Returns: del
= angular separation between points (degrees)
!
azi
= azimuth at 1st point to 2nd point, from N (deg.)
!
! Notes:
!
! (1) applies to geocentric not geographic lat,lon on Earth
!
! (2) This routine is inaccurate for del less than about 0.5 degrees.
!
For greater accuracy, use double precision or perform a separate
!
calculation for close ranges using Cartesian geometry.
!
subroutine SPH_AZI(flat1, flon1, flat2, flon2, del, azi)
implicit none
real :: flat1,flon1,flat2,flon2,del,azi,pi,raddeg,theta1,theta2, &
phi1,phi2,stheta1,stheta2,ctheta1,ctheta2,
&
sang,cang,ang,caz,saz,az
if ( (flat1 == flat2 .and. flon1 == flon2) .or.
&
(flat1 == 90. .and. flat2 == 90.) .or.
&
(flat1 == -90. .and. flat2 == -90.) ) then
del=0.

72

CHAPTER 4. FORTRAN

azi=0.
return
end if
pi=3.141592654
raddeg=pi/180.
theta1=(90.-flat1)*raddeg
theta2=(90.-flat2)*raddeg
phi1=flon1*raddeg
phi2=flon2*raddeg
stheta1=sin(theta1)
stheta2=sin(theta2)
ctheta1=cos(theta1)
ctheta2=cos(theta2)
cang=stheta1*stheta2*cos(phi2-phi1)+ctheta1*ctheta2
ang=acos(cang)
del=ang/raddeg
sang=sqrt(1.-cang*cang)
caz=(ctheta2-ctheta1*cang)/(sang*stheta1)
saz=-stheta2*sin(phi1-phi2)/sang
az=atan2(saz,caz)
azi=az/raddeg
if (azi.lt.0.) azi=azi+360.
end subroutine SPH_AZI
The subroutine is called with the statement:
call SPH_AZI(lat1, lon1, lat2, lon2, del, azi)
In this case, the lat/lon values are passed to the subroutine while del and azi are
passed back to the main program. However, note that if flat1, etc., were changed in
the subroutine, then the corresponding variable would also be changed in the main
program as well. lat1 in the main program and flat1 in the subroutine point to the
same memory location. If one is changed, the other automatically changes as well.
Fortran is not case-sensitive so sph azi and SPH AZI have the same meaning. I
like to put subroutine names in all caps so they are more visible. Note that sph azi
does not have to be declared in the main program.
The subroutine is well-documented in this case, explaining exactly what is going
into the subroutine and what is going out, as well as some of the limitations of
the routine. This may seem like overkill, but documenting your subroutines as
completely as possible is likely to save you considerable time later if you ever want
to use the routine again. It is good to document the routine well enough that you,
or someone else, can use it correctly without having to study the code itself. Clarity
is importantI have seen versions of this routine that do not make clear whether

4.9. USER DEFINED FUNCTIONS AND SUBROUTINES

73

the azimuth is measured at the first point to the second point, or vice versa. Listing
the limits of the subroutine may help prevent future misuse of the routine, in this
example it may prevent the naive user from assuming that the distance returned
is accurate when used with geographic latitude and longitude on the Earth. The
routine is also designed to be robust with respect to pathological inputs, such as
when the two points have the same coordinates.
ASSIGNMENT F8
Write a single subroutine that computes the volume, surface area, and circumference of a sphere, given its radius, together with a main program that inputs different
values for the radius from the keyboard and prints the results. Allow the user to
terminate the program by entering zero for the radius. E-mail me the source code
in a single file containing both the main program and the subroutine.

4.9.2

Linking to subroutines during compilation

A powerful aspect of subroutines is that one can link to compiled versions of existing
subroutines without having to recompile them. This means that you only have to
maintain one version of a subroutine; it need not be listed along with the source
code of the main program. For example, the SPH AZI subroutine is also part of a
F77 package of spherical geometry subroutines contained in:
~shearer/PROG/SUBS/sphere_subs.f
Our main program could simply consist of:
program userdist2
implicit none
real flat1,flon1,flat2,flon2,del,azi
do
print *,Enter 1st point lat,lon
read *,flat1,flon1
print *,Enter 2nd point lat,lon
read *,flat2,flon2
call SPH_AZI(flat1,flon1,flat2,flon2,del,azi)
print *,del, azi = , del, azi
end do
end program userdist2
When we compile this program, we need to indicate where the SPH AZI subroutine can be found. We want to link with what is called an object file for the

74

CHAPTER 4. FORTRAN

subroutines. Object files end in .o and there is a sphere subs f90.o file in the same
directory ( shearer/PROG/SUBS) as the source file sphere subs.f.
You must link with object files that have been compiled using the same Fortran
compiler as you use for the main program. When I switched to using gfortran from
g77 a year or two ago, I had to recompile my subroutines in order for them to link
properly with gfortran compiled code. I still occasionally run into this problem when
I link to a subroutine I have not used in awhile.
To create a F90 object file, enter, for example:
gfortran -c someprogram.f90 -o someprogram.o
You can also create a F90 object file from F77 source code:
gfortran -c anotherprogram.f -o anotherprogram.o
This will work on native F77 code that includes non-F90 compatible features
(e.g., comment lines starting with C), because the F90 compiler sees that the source
file ends in .f rather than .f90.
To compile userdist2 and link to sphere subs.o, we enter:
gfortran userdist2.f90 /home/shearer/PROG/SUBS/sphere_subs.o -o userdist2
This quickly becomes cumbersome to write so you will find it convenient to set
up a Makefile to keep track of all this for you. Here is part of a Makefile that does
this for this program:
OBJS1=

/home/shearer/PROG/SUBS/sphere_subs_f90.o

userdist2: Makefile userdist2.f90 $(OBJS1)


gfortran userdist2.f90 $(OBJS1) -o userdist2
Note that you MUST use a tab to generate the spaces before gfortran for the
Makefile to work correctly! This is a leading source of confusion for novice Makefile
users.
You can list the full path names for any number of subroutine object files in this
way. To compile the program, simply enter:
make userdist2

4.10. INTERNAL PROCEDURES

75

ASSIGNMENT F9
Study the SPH MID subroutine contained in sphere subs.f (in shearer/PROG/SUBS).
Write a program that uses this subroutine to compute the midpoint between two
(lat,lon) points input by the user. Print out the latitude and longitude of this point.
Do not attempt to copy the SPH MID source code, just link to the sphere subs f90.o
object file. Make sure that the argument list for SPH MID in your program matches
the subroutine arguments. E-mail me the source code and also tell me where the
working executable version of the program is located. Be sure to give me execute
permission so that I can try running your program. (Remember: ls -l shows the
current permissions, chmod is how you change the permissions, man chmod will
explain this.) If you are at sea (!), then you may have to copy the sphere subs.f routines to your local machine. You can make an object file from them using gfortran
-c sphere subs.f -o sphere subs.o

4.10

Internal procedures

The functions and subroutines that we discussed above are called external because
are located outside of the main program. With external procedures all of the values
that are to passed into and out of the procedure must be part of the argument list.
All other variables are local to the procedure or to the main program, even if they
have the same name as a variable in a different procedure. External procedures are
the only kind of procedure allowed in F77 and are what you will find in books like
Numerical Recipes or in the subroutine libraries maintained by various scientists at
SIO. Their great advantage is their portabilityeverything you need to know about
what they do is contained in their argument list.
However, in some cases it may be more convenient to use an internal subroutine
or function, a method that is new to F90. Internal procedures are listed immediately
BEFORE the end statement in the main program. All variable names are shared
within internal procedures; thus argument lists are not necessarily required for them
to work.
Here is an example adapted from the Brainard text that illustrates how internal
subroutines work:
program sort3
implicit none
real :: a1, a2, a3

76

CHAPTER 4. FORTRAN
call read_numbers
call sort_numbers
call print_numbers

contains
subroutine read_numbers
print *,Enter three numbers
read *, a1, a2, a3
end subroutine read_numbers
subroutine sort_numbers
if (a1 > a2) call swap(a1,a2)
if (a1 > a3) call swap(a1,a3)
if (a2 > a3) call swap(a2,a3)
end subroutine sort_numbers
subroutine print_numbers
print *,The sorted numbers are:
print *, a1, a2, a3
end subroutine print_numbers
subroutine swap(x,y)
real :: x, y, temp
temp = x
x = y
y = temp
end subroutine swap
end program sort3

Internal procedures are listed following a contains statement and before the end
statement for the main program. Variables already declared in the main program
need not be declared in the internal procedure. Arguments are optional; they are
used here in the swap routine to permit it to be used for different pairs from a1, a2
and a3. Note that the procedures are not nested (e.g., one might have tried to put
swap internal to sort numbers); internal procedures may not contain other internal
procedures. Note also that variables declared within a subroutine are purely local
to the subroutine. For example, the value for temp (in swap) is not available in the
main program. Even if temp were declared in the main program, its value will not
correspond to the value of temp in the swap subroutine (try it!).
The use of internal subroutines is rather pointless in this case because the code
would probably be clearer without them. However, for longer programs they may
well be useful for making the code more structured. If you write a program and

4.11. EXTENDED PRECISION

77

notice that you are using the same block of code over and over again, this would a
situation where using a subroutine would make sense. The advantage of an internal
subroutine is that you dont have to match the argument lists and declare all of the
variables. External subroutines can become cumbersome when they involve a large
number of variables. Often common blocks are used to avoid long argument lists in
this case (more about this later). In many case, internal subroutines may provide a
neater way to handle this situation.

4.11

Extended precision

By default, Fortran stores real variables using 4 bytes, providing a precision of


about 6 to 7 significant figures over a range from (on the Suns) 1.175494e-38 to
3.402823e+38. If this is insufficient precision, then variables can be defined in double
precision. In F77, this was done by declaring them as double precision or real*8
variables. This syntax will still work, although F90 has introduced a new concept
the kind specifier for variables. All of these methods are demonstrated in this
program:
program testdouble
implicit none
character (len = 30) :: fmt = "(a5,f44.40)"
real a4
real*8 a8
double precision aa
real (kind=4) :: k4
real (kind=8) :: k8
real (kind=16) :: k16
!only include if using 64-bit machine + compiler
a4 = 8.9
print fmt, a4 = , a4
a8 = 8.9d0
print fmt, a8 = , a8
aa = 8.9d0
print fmt, aa = , aa
k4 = 8.9
print fmt, k4 = , k4
k8 = 8.9_8
print fmt, k8 = , k8
k16 = 8.9_16

!only include if using 64-bit machine + compiler

78

CHAPTER 4. FORTRAN

print fmt, k16= , k16


end program testdouble

!only include if using 64-bit machine + compiler

which produced the following output on my old Sun computer:


a4 =
a8 =
aa =
k4 =
k8 =
k16=

8.8999996185302734375000000000000000000000
8.9000000000000003552713678800500929355621
8.9000000000000003552713678800500929355621
8.8999996185302734375000000000000000000000
8.9000000000000003552713678800500929355621
8.900000000000000000000000000000000308148

and which produces the following output on my Mac running 32-bit gfortran:
a4
a8
aa
k4
k8

=
=
=
=
=

8.8999996185302734375000000000000000000000
8.9000000000000003552714000000000000000000
8.9000000000000003552714000000000000000000
8.8999996185302734375000000000000000000000
8.9000000000000003552714000000000000000000

after I comment out the k16 lines.


Note that a4 and k4 are regular real*4 variables and approximate 8.9 as 8.999996.
Much greater precision is obtained with a8, aa, and k8, which are all real*8 (double
precision) variables. k16 is a real*16 variable (quadruple precision).
NOTE: k16 does not currently work with 32-bit gfortran. Does it work on your
computer?
On the Suns, kind=4 is for single precision (4-byte real), kind=8 is for double
precision (real*8), and kind=16 is for 16-byte real. Note that the precision of numbers may be specified by appending them with an underscore and the kind value.
Thus we write 8.9 8 to indicate that 8.9 is to represented as an 8-byte real number.
The use of kind is intended to give you greater control over the degree of precision in your programs and ultimately make codes that are less platform dependent
in their behavior. Im not sure if this has been achieved! It appears that F90 also allows complex variables to be declared as double precision, something not allowed in
F77. We will check if this is actually true later when we consider complex variables.
IMPORTANT NOTE: You must define the extra-precision variables using numbers with the appropriate precision. If you write:
a8 = 8.9

!***Not correct if a8 is real*8

you will get single precision accuracy. Even the dble operator (which works in F77,
at least on the Suns) will not work in F90 in assigning variables:

4.11. EXTENDED PRECISION


a8 = dble(8.9)

79

!***Not correct if a8 is real*8

will assign a8 only at single precision accuracy. The key is to write:


a8 = 8.9d0

!correct way to set double precision variables

or
k8 = 8.9_8
Note that the 0 following the d is the power of 10, thus 1.23d3 is 1230, 0.84d-2 is
0.0084, etc.
To get the full 16-byte precision (only on the Suns or 64-bit Macs), you must
say
k16 = 8.9_16
rather than k16=8.9d0 or k16=8.9 8, which will assign k16 only to double precision
accuracy.
Many of my existing F77 routines use dble( ) to define double precision numbers.
These will not work correctly if they are changed to F90, unless all of the dble( )
commands are rewritten. They are fine, however, if they continue to be compiled
using F77. I dont know if this is a bug in Sun F90, or if the use of dble( ) in this
way is non-standard Fortran.
ASSIGNMENT F10
Examine the datetime.f source code in shearer/PROG/SUBS (also see class
website) and write a F90 program to compute the number of seconds that have
elapsed since noon on your birth date (or the exact time if you know it) until a
user specified date and time. Have the program print out this number of seconds.
You will want to use the DT TIMEDIF subroutine. Make sure that all of the
variables match and are of the same type (integer, real, and real*8 for timdif, the
final argument). You should also be aware that & in column 6 of F77 code is how
lines are continued, that is:
subroutine DT_TIMEDIF(iyr1,imon1,idy1,ihr1,imn1,sc1,
&

iyr2,imon2,idy2,ihr2,imn2,sc2,timdif)

80

CHAPTER 4. FORTRAN

is actually all one line.


E-mail me the source code of your program.
Extra credit: Determine the date and time when you will be (or were) exactly one
billion seconds old. This will require using one of the other subroutines in datetime.f
Mark your calendar for a party!

4.11.1

Integer sizes

Just as you can set aside fixed numbers of bytes for real numbers, you can do
the same to store integers of varying sizes. The standard is 4-byte integers, which
have 32 bits and thus will go (approximately) from 231 to 231 . However, 2-byte
and 8-byte integers are also allowed. These are sometimes called short and long
integers, respectively. The following example program shows how this works, using
two different ways to define the integers:
program testinteger
implicit none
integer*2 i2
integer i4
integer*8 i8
integer (kind=2) :: k2
integer (kind=4) :: k4
integer (kind=8) :: k8
i8 = 32000
i4 = i8
i2 = i8
print *, i2, i4, i8
i8 = 32000**2
i4 = i8
i2 = i8
print *, i2, i4, i8
i8 = 32000
i8 = i8**4
i4 = i8
i2 = i8
print *, i2, i4, i8
end program testinteger
This will produce the output:

4.12. ARRAYS
32000
0
0

32000
1024000000
0

81
32000
1024000000
1048576000000000000

The zero fields in the output are incorrect and indicate that the number was too
large to be stored in the given variable type. Note that k2, k4 and k8 (defined using
the kind statement) will give the same results, even though they are not explicitly
tested in the program.
Regular integers suffice for most purposes. Short integers are useful to save space
for data sets that dont require bigger numbers (i.e., beyond about 32000).

4.12

Arrays

Arrays are defined in F90 as in this example


real, dimension(100) :: a, b
integer, dimension(50,2) :: index
which defines a and b as 100 element arrays (a(1) to a(100)) and index as a 50x2
element array (index(1,1) to index(50,2)). Note that the default starting array number is 1 (not 0 as in C). Also note that array elements are written using parentheses,
not brackets as in C.
Older Fortran programs would define the same arrays as follows:
real a(100), b(100)
integer index(50,2)
This syntax will still work and is easier to read for short programs. The new standard
has the advantage, however, of being able to easily define many arrays with identical
dimensions.
A nice feature of Fortran is that we can specify the lower and upper array
boundaries explicitly in the declaration, e.g.,
real, dimension(-100:100) :: a, b
integer, dimension(0:50, 2) :: index
In this case there are 201 values for a and b (from a(-100) to a(100)) and 102
values for index (from index(0,1) to index(50,2)).
Here is an example program that uses an array to compute prime numbers less
than 100:

82

CHAPTER 4. FORTRAN

program prime
implicit none
integer, parameter :: maxnum=100
integer :: i, j, prod(maxnum), max_i, max_j, nprime=0
do i = 1, maxnum
prod(i) = 0
enddo
max_i = floor(sqrt(real(maxnum)))
do i = 2, max_i
if (prod(i) == 0) then
max_j = maxnum/i
do j = 2, max_j
prod(i*j) = 1
enddo
end if
end do
do i = 2, maxnum
if (prod(i) == 0) then
nprime = nprime + 1
print "(i4)", i
end if
enddo
print *, Number of primes found = ,nprime
end program prime

The method is sometimes called the sieve of Eratosthenes, named after a Greek
mathematician from the 3rd century BC. We start with a list of numbers from 2
to 100 and consider them all possible primes. In the program, this list is the array
prod. We initialize the array by setting its values to zero. The strategy is to then
flag numbers that are not prime by setting the corresponding array values to one.
We then start with the number 2 and eliminate all multiples of 2 up to the
maximum value of 1000. We then move to 3 and eliminate (i.e., set prod to 1) all
multiples of 3. We can skip 4 because 4 and all its multiples were already eliminated.
We need check numbers, i, only up to sqrt(100) because larger factors would already
have been eliminated.
When we are finished, we simply print out all numbers that were not eliminated.
In our case, this is all i such that prod(i) = 0. We count the number of primes using
the counter variable nprime.
A new aspect of this program is the defining of maxnum as a parameter:

4.12. ARRAYS

83

integer, parameter :: maxnum=100


This tells the program to set maxnum to 100 and that this value will never be
changed during the program. It also allows the array dimension for prod (in the
next line) to be set by maxnum. This makes it easy for us to change the size of our
prime search by changing the value of maxnum without having to change anything
else in the program.
Note that in the statement
max_i = floor(sqrt(real(maxnum)))
it is necessary to convert maxnum to a real number before taking the square root.
This is because the argument to the Fortran sqrt function must be real; if we had
written sqrt(maxnum) we would have gotten an error during compilation.
This program lists the prime numbers with one number per output line and thus
will become cumbersome for larger values of maxnum. Lets modify the program to
list 10 primes per line. To to this, we save the prime numbers in a separate array
called pnum. Here is the code:
program prime2
implicit none
integer, parameter :: maxnum=1000
integer, dimension(maxnum) :: prod, pnum
integer :: i, j, max_i, max_j, nprime=0
do i = 1, maxnum
prod(i) = 0
enddo
max_i = floor(sqrt(real(maxnum)))
do i = 2, max_i
if (prod(i) == 0) then
max_j = maxnum/i
do j = 2, max_j
prod(i*j) = 1
enddo
end if
end do
do i = 2, maxnum
if (prod(i) == 0) then
nprime = nprime + 1
pnum(nprime) = i
end if

84

CHAPTER 4. FORTRAN
enddo
print *, Number of primes found = , nprime
print "(10i5)", (pnum(i), i = 1, nprime)

end program prime2


Note that we use the new F90 convention for setting up the arrays. The program
is pretty self-explanatory, but we do introduce a new concept in the output line:
print "(10i5)", (pnum(i), i = 1, nprime)
This is termed an implicit do loop and is very useful in input/output (I/O)
statements. Note that the format specifier 10i5 applies to the first 10 numbers and
then is reused for the next 10, etc. The parentheses around (pnum, i=1,nprime) are
required. More complicated expressions are also possible, e.g.,
print "(5(i4,i4,2x))", (i, pnum(i), i = 1, nprime)
will print the primes to the right of their count.
Array values can be assigned one element at a time, or can be assigned in single
statements as in this F90 example:
program testarray
implicit none
integer, dimension(3) :: x = (/ 1, 2, 3 /), y = 1
print *, x
x = (/ 15, 30, 40 /)
print *, x
print *, y
y = 2
print *, y
end program testarray
and its output:
1 2 3
15 30 40
1 1 1
2 2 2
Note that when an array is set to a scalar, every array element is set to the
value of the scalar, i.e., y =2 sets all y values to 2. The individual elements can be
specified if they are listed between (/ and /) (Warning: Dont put a space between

4.12. ARRAYS

85

the / and the parenthesis!!). The number of values must match the dimension of the
vector. We will discuss this more later when we consider two-dimensional arrays in
detail.
So we see that we could have written:
integer, dimension(maxnum) :: prod = 0, pnum
in program prime rather than wasting the three lines we used later to set all elements
in prod to zero.
There are some cases when we would prefer not to advance to the next line
following a print statement. Instead, we would like to have output continue on the
same line. Here is an example of how this can be used to print 10 prime numbers
per line without having to store the prime numbers in a separate array:
program prime3
implicit none
integer, parameter :: maxnum=1000
integer :: i, j, prod(maxnum)=0, max_i, max_j, nprime=0
max_i = floor(sqrt(real(maxnum)))
do i = 2, max_i
if (prod(i) == 0) then
max_j = maxnum/i
do j = 2, max_j
prod(i*j) = 1
enddo
end if
end do
do i = 2, maxnum
if (prod(i) == 0) then
nprime = nprime + 1
if (mod(nprime,10) /= 0) then
write (*,"(i4)", advance=no) i
else
write (*,"(i4)") i
end if
end if
enddo
print *
print *, Number of primes found = ,nprime
end program prime3
The statement

86

CHAPTER 4. FORTRAN
write (*, "(i4)") i

works exactly the same as the print statement we used previously. (print only goes
to the screen, write can go to the screen or to a file. The * in the above write
statement means standard output, not free format).
The statement
write (*, "(i4)",advance=no) i
does the same thing, except that it does not advance to the next line. The program
checks to see if the prime count (nprime) is divisible by 10; if not then the output
does not advance. Following the enddo, a print * statement is needed to be sure
that the final output (Number of primes found) is on a separate line.
NOTE: With the Suns, under both F77 and F90, the carriage return (advancement to next line) can also be suppressed simply by adding a $ to the format
statement. Thus the above line could be replaced with
print "(i4,$)", i
and the result would be the same. However, I dont think this is standard Fortran
so you may get into trouble at some point with this convention.

4.12.1

Checking for problems with the -fcheck=bounds option

Suppose when writing the prime number program, we made the mistake of writing:
max_j = maxnum
rather than
max_j = maxnum/i
When the program is run, at some point the product i*j in the line
prod(i*j) = 1
will exceed the array dimensions of prod (maxnum). The program will then start
overwriting adjacent memory locations and disaster will result. Most commonly the
program will crash with the following kind of message:

4.12. ARRAYS

87

Bus error
or
Segmentation Fault
These types of errors result from exceeding array boundaries or mismatched
subroutine arguments. These are not very helpful error messages because they do
not tell you where in the program the problem occurred. However, there is usually a
compiler option you can set that will provide more useful diagnostics. For gfortran,
it is the -fcheck=bounds option and can be invoked either by compiling prime4 with:
gfortran -fcheck=bounds prime4.f90 -o prime4
or changing the Makefile to:
%:

%.f90
gfortran -fcheck=bounds $< -o $*

(remember the tab before gfortran!) If you have compiled your program with this
option, then the computer will check to see if your array indices exceed their boundaries and tell you where the problem occurs:
shearer@katmai 324> ./prime4
At line 11 of file prime4.f90
Fortran runtime error: Array reference out of bounds for array prod,
upper bound of dimension 1 exceeded (1002 > 1000)
This is so helpful, and will save you so much time in debugging your programs,
that I recommend that you ALWAYS use this option when you are first getting your
programs to work. It does slow the code somewhat, so for programs where run time
is a factor, you should then recompile without -fcheck=bounds for the final working
version. (in these cases, you may also want to experiment with the -O compiler to
make your code run faster, see below).
ASSIGNMENT F11
Fortran 90 includes a built in random number subroutine (called random number).
Here is an example program that prints out 20 random values from between 0 and
1.

88

CHAPTER 4. FORTRAN

program listrand
integer :: i
real :: xran
do i = 1, 20
call random_number(xran)
print *, xran
enddo
end program listrand
Write a simple F90 program to generate 10000 random numbers between 0 and
1. Test the randomness by counting the number of these that fall in each of 10
evenly spaced bins between 0 and 1 (i.e., 0 to 0.1, 0.1 to 0.2, etc.). Print these
counts to the screen. Here is an example of what your output might look like:
0
1
2
3
4
5
6
7
8
9

1009
1048
1001
1038
1008
959
993
925
1017
1002
HINT: Set up an integer array with 10 elements and initialize all the values to

zero. For each random number generated, then add one to the appropriate array
element to keep track of how many of the numbers are in that bin.

4.12.2

More about random numbers

To learn more about random number generators, please consult the book Numerical
Recipes, which has a discussion about good algorithms and bad algorithms. I have
not tested the built-in F90 random number generator to see how well it measures
up, but it seems pretty good for most purposes.
You may notice that if you use random number in a program that you will get the
same numbers every time you run the program. This is desirable in many cases. For
example, if you are debugging a program, you want any problems to be completely
reproducible. But in other cases, you will want to get different results each time.
One simple way to achieve this would be to ask the user to input an integer and
then compute that many random numbers before continuing to the main program.
But this requires the user to think of different input numbers. To avoid this, we can

4.12. ARRAYS

89

randomize the start of the random number generator itself by using some number
that will always be different, for example from the system clock on the computer.
This turns out to be annoyingly hard to do in F90. Here is an example program:
program listrandseed
implicit none
integer :: i, count, count_rate, count_max, narg
integer, dimension(8) :: idate
integer, dimension(12) :: seed
real :: xran
! randomize start of random numbers
call date_and_time(VALUES=idate)
print *, idate = , idate
call system_clock(count, count_rate, count_max)
seed(1) = count
seed(2:9) = idate(8:1:-1)
seed(10:12) = idate(6:8)
print *, seed = , seed
call random_seed(SIZE=narg)
print *, narg = , narg
call random_seed(PUT=seed)
do i = 1, 20
call random_number(xran)
print *, xran
enddo
end program listrandseed
Once you have this working, you may want to remove the print statements in
the random seed part. But before doing so, make sure that seed is dimensioned to
narg. When I recently changed computers, narg changed from 8 to 12 and I had to
rewrite this little program.

4.12.3

Arrays as subroutine arguments

Suppose we have defined an array x as:


integer :: x(100)
If we use x in the argument list for a subroutine, e.g.,
call SUMTOT(x,n,y)

90

CHAPTER 4. FORTRAN

we include x without any parentheses. The subroutine argument list must also
include a matching array of the same size. All array values are passed to and from
the subroutine. We will discuss later how to write subroutines that will work for
arrays of differing size in the calling program.

4.13

Character strings

A character string may be declared in F90 as follows:


character (len = 20) :: name
declares the variable name to be a character string of length 20. One can also
define an array of character strings:
character (len = 20), dimension(100) :: name_array
The old F77 ways to define these strings will still work in F90:
character name*20, name_array(100)*20
or
character*20 name, name_array(100)
A string variable can be assigned as, e.g.,
name = "Bill Clinton"
The quotes (apostrophes will also work) are required. If name in this case was
declared as 20 characters long, then trailing blanks are added so name actually is
Bill Clinton If, on the other hand, name was declared as 10 characters long,
then name would contain Bill Clint as the extra letters would be cut off (no error
results).
The length of a character string may be obtained with the len function:
len("Bill") will give 4
len(" ") will give 2
len(name) will give 20 if name was declared as 20 characters
long, regardless of any trailing blanks
Substrings may be obtained as follows for name = Bill Clinton:

4.13. CHARACTER STRINGS

91

name(1:4) is "Bill"
name(6:12) is "Clinton"
name(6:6) is "C"
Thus, we could change the last name as follows:
name = "Bill Clinton"
name(6:12) = "Jones "
Note that the trailing blanks are necessary to overwrite the on
The following moves the string one character to the right:
name(2:13) = name(1:12)
This was not allowed in F77 (because of the overlap) but is OK in F90.
Often one wants to know the length of a string without any trailing blanks. This
can be done with the built in function len trim. Another built in function is called
trim which gives the string without any trailing blanks.
To find the position of a substring within a string, the built in funtion index can
be used:
index("Clinton", "in") returns 3
index("Clinton", "on") returns 6
index("Clinton", "no") returns 0 (indicates string not found)
To append one string onto the end of another (concatenation), the // operator
can be used:
text1 = "Bill Clinton"

// " was elected in 1992."

sets text1 to the complete sentence (text1 must have been already declared of sufficient length). Note the blank before was is necessary for there to be a space
between the word.
When trailing blanks are present, the trim function is very useful. For example,
if name is declared as 20 characters long, then
name = "Bill Clinton"
text1 = name // " was elected in 1992"
sets text1 to:
Bill Clinton

was elected in 1992

92

CHAPTER 4. FORTRAN

while
text1 = trim(name) // " was elected in 1992"
will produce the correct result.
Here is an example program that illustrates how to input a string:
program vote
implicit none
character (len = 80) :: name
print *, Who did you vote for in 1992?
read "(a)", name
if (index(name,ill) /= 0 .or.index(name,lin) /= 0) then
print *, "Then you are likely a Democrat."
else if (index(name,eor) /= 0 .or.index(name,ush) /= 0) then
print *,"Then you are likely a Republican."
else
print *,"Then you are likely an independent voter."
end if
end program vote
Note the format for the read statement:
read "(a)", name
does not need to use a80 although that would also work. The free format read
statement:
read *, name
is not recommended in this case because it will only read until the first blank. Thus
Bill Clinton would be read in as Bill (unless the full name were enclosed in quotes,
but who wants to make the user do that?).
ASSIGNMENT F12
Write a simple version of the famous 1960s program, Eliza, that simulates a
psychologist talking to a patient. A humorous fictional account of such a program
is contained in Small World, the David Lodge satire of academia. There is an
online version of the original program at: http://chayden.net/eliza/Eliza.html
Programs of this type are now called bots and examples can be found at
alicebot.org.
Begin your program by asking the patient:

4.14. I/O WITH FILES

93

"How are you?"


Then use the index function to input a line from the user/patient. Search this
line for various key words that you can then use to guide the computers response.
Design the program so that it will continue this conversation between doctor and
patient.
Some suggestions:
If the input string contains a "?", then respond:
"Let me ask the questions. Why is it important to you?"
If the string contains "!", then respond:
"You seem pretty upset. What is really bothering you?"
If the string contains "mother" then respond:
"Tell me more about your mother."
If the string contains any swear words, respond:
"There is no need for that kind of language."
If you identify no key words on your list, then respond
with something generic like:
"Go on." or "Tell me more about it."
Be creative but dont spend an infinite amount of time on this assignment. Your
program will be more realistic if you use random numbers (see above) to randomly
select from a series of possible responses so that the computer does not keep saying
the same thing.

4.14

I/O with files

So far, all off our examples have involved input from the keyboard and output to
the screen. Now let us see how to open files, read from them, and write data to
them. Here is a program that reads pairs of numbers from an input file, computes
their product, and outputs the original numbers and their product to an output file.
program fileinout
implicit none
character (len=100) :: infile, outfile
integer :: i, ios
real :: x, y, z
print *, Enter input file name
read "(a)", infile

94

CHAPTER 4. FORTRAN
open (11, file=infile, status=old)
print *, Enter output file name
read "(a)", outfile
open (12, file=outfile)
do i = 1, 999999999
!more lines than any likely input file
read (11,*, iostat = ios) x, y
if (ios < 0) then
exit
else if (ios > 0) then
print *, ***Warning, read error on line: , i
cycle
endif
z = x * y
write (12,*) x, y, z
enddo

close (11)
close (12)
end program fileinout
We open the input file with the statement:
open (11, file=infile, status=old)
Files must be assigned unit numbers for I/O. This statement opens the file with
unit 11. On most systems, unit 5 is defined as the default standard (keyboard) input
and unit 6 is defined as the default standard (monitor) output, so these numbers
should always be avoided as unit numbers. I have gotten into the habit of using
units 11, 12 and 13 for most of my I/O.
file=infile is what provides the filename. Note that we could hardwire a file
name here by writing:
open (11, file="file1", status=old)
The status=old is optional, but I recommend always using it for opening files which
should already exist. When you say status=old and the file does not exist, then the
program will terminate and print an error condition. If you have not set status=old
then the program will create a new file and when you try to read from it, you will
reach the end of the file immediately and the program will terminate without any
obvious errors (except that you will now have zero length files in your directory!).
Another value for status is:

4.14. I/O WITH FILES


status=new

95

file must not already exist (useful to avoid


overwriting existing files)

However, in this example we choose to simply write:


open (12,file=outfile)
so that we can overwrite existing file names. This assigns unit 12 to the output file
(the input and output file unit numbers must be different!).
We read data from the input file with the statement:
read (11, *, iostat = ios) x, y
This does a free-format read from unit 11 (for a fixed format, the * is replaced
with a format specifier such as (i2, 2f6.2) etc.). Because we do not know, a priori,
how long the input file is, we need some way to recognize when we have reached the
end of the file (EOF). The iostat = ios specifier does this by assigning the local
variable ios (which we must declare as an integer) that is zero for a normal read,
positive if there is an error during the read, and negative if the end of the file is
reached. Note that we are free to choose any name for ios that we want as long as
it is declared as an integer.
Once ios has been set, we check its values to see if there is an EOF or error
condition:
if (ios < 0) then
exit
else if (ios > 0) then
print *, ***Warning, read error on line: , i
cycle
endif
For ios < 0, we exit the do loop because there are no more lines to read. For ios >
0, we print a warning message and specify the line number. Being able to print the
line number is useful because otherwise we would not know where to look for the
problem in a big input file. This is why we use i in the do loop. Following the
warning message, we use cycle to skip this line and read the next line. Without
the cycle, the program would write an output line using the same values of x
and y used for the previous line, the result being an erroneous duplicate line in the
output file.
We should note at this point that older Fortran programs do not use this method
of testing for the end of the file. Rather, they will say something like:

96

CHAPTER 4. FORTRAN
read (11, *, end = 123) x, y

where 123 is a statement label that the program will branch to when the end of file
is reached. This method still works in F90 but you lose style points for having to use
a statement label. However, despite this there is one aspect in which the old way of
doing things is better than the new way. Suppose there is an error in the input file,
for example a stray character in one of the input lines instead of two numbers. In
the old way of doing things, the program will crash on the faulty input. In the new
approach, using the iostat= convention, any read errors are ignored. You never
want to ignore errors without at least being aware of them. So if you use
iostat= then you always should explicitly look for errors. In other words, you should
never write code like this:
! example of what NOT to do!
do
read (11,*, iostat = ios) x, y
if (ios < 0) exit
z = x * y
write (12,*) x, y, z
enddo
In this case, any input error would be flagged by assigning a positive number to ios.
But this code does not check for this and will therefore continue running, retaining
the previous values for x and y. The result will be that the output file will repeat
the results from the previous line. This is not good, because we have introduced
an error into the output that might not be immediately obvious. In the old way of
doing things, however, the program crashes on the faulty input.
Continuing to examine the program fileinout.f90, we write the original numbers
and their product using a free format write to unit 12:
write (12,*) x, y, z
Note that x and y will probably not be in the same format as in the input file.
Finally, after the EOF but before we end the program it is good practice to close
the files:
close (11)
close (12)

4.15. MORE ABOUT MULTI-DIMENSIONAL ARRAYS

97

although the files will be closed automatically anyway when the program ends.
ASSIGNMENT F13
Write a program that reads from a text file containing 5 numbers per line.
Compute the mean of the 5 numbers and then subtract the mean from each of the 5
original numbers. Output the 5 demeaned numbers to an output file. Allow the
user to specify the input and output file names.
For example, if the input file contains:
10 20
1 2 3
2 4 6
5 5 5

30 40 50
4 5
8 10
5 5

your program should write to the output file:


-20.000
-2.000
-4.000
0.000

-10.000
-1.000
-2.000
0.000

0.000
0.000
0.000
0.000

10.000
1.000
2.000
0.000

20.000
2.000
4.000
0.000

Hint: Use the fileinout.f90 program listed above as a template for the file
handling part of your code.

4.15

More about multi-dimensional arrays

Two-dimensional arrays in Fortran may be defined as follows:


integer a(2,3)
integer, dimension(2,3) :: a

!F77 convention
!F90 convention

This defines a matrix in which the first indices will vary from 1 to 2, and the second
from 1 to 3 (recall that the default starting array index is 1 in Fortran).
Here is a test program that illustrates how the numbers in a can be specified
and how they are stored in memory:
program testmatrix
implicit none
integer, dimension(2,3) :: a
a(1, 1:3) = (/ 1, 2, 3 /)
a(2, 1:3) = (/ 4, 5, 6 /)
print *, a
end program testmatrix

98

CHAPTER 4. FORTRAN

The output of this program is:


1 4 2 5 3 6
First, note how the array values are specified. (/ 1, 2, 3 /) is an example of what
is called an array constructor and is a sequence of scalar values along one array
dimension only. We equate this to the desired part of the a matrix by writing, for
example:
a(1, 1:3) = (/ 1, 2, 3 /)
thus setting a(1,1)=1, a(1,2)=2, and a(1,3)=3.
Finally note how the array is printed. The print *, a statement will dump the
values of a in the same order that they are stored in memory. In doing this, Fortran
varies the first array element first, then the second, etc. In this case, the result is:
a(1,1), a(2,1), a(1,2), a(2,2), a(1,3), a(2,3)
Similarly a three-order array b(2,2,2) would be stored as:
b(1,1,1)
b(2,1,1)
b(1,2,1)
b(2,2,1)
b(1,1,2)
b(2,1,2)
b(1,2,2)
b(2,2,2)
This is important to remember when passing arrays to and from subroutines
for cases when the array in the main program has more dimensions than the array
in the subroutine. For example, suppose we wished to store 100 seismograms of
1000 points each in the main program. We have a filtering subroutine that will act
on a single time series vector. If we dimension the array in the main program as
b(1000,100) then we can call the subroutine using a statement like:
call FILTER(b(1,i),npts)
where i is the seismogram number. The filter can then act on b(1:npts, i) and would
be of the form:

4.15. MORE ABOUT MULTI-DIMENSIONAL ARRAYS

99

subroutine FILTER(c, n)
real, dimension(n) :: c
.
.
This is a very common type of construction. Note that this would not work if
the array was dimensioned as b(100,1000) because then each 1000-point time series
would not be continuous in memory.
Here is an example program to multiply two matrices:
program matmult
implicit none
real, dimension(3,3) :: a, b, c
integer :: i, j, k
a(1,1:3) = (/ -5.1, 3.8, 4.2 /)
a(2,1:3) = (/ 9.7, 1.3, -1.3 /)
a(3,1:3) = (/ -8.0, -7.3, 2.2 /)
b(1,1:3) = (/ 9.4, -6.2, 0.5 /)
b(2,1:3) = (/ -5.1, 3.3, -2.2 /)
b(3,1:3) = (/ -1.1, -1.8, 3.0 /)
do i = 1, 3
do j = 1, 3
c(i,j) = 0.0
do k = 1, 3
c(i,j) = c(i,j) + a(i,k) * b(k,j)
enddo
enddo
enddo
print *, "Matrix a follows"
call PRINTMAT(a)
print *, "Matrix b follows"
call PRINTMAT(b)
print *, "Matrix c = a*b follows"
call PRINTMAT(c)
contains
subroutine PRINTMAT(x)
real, dimension(3,3) :: x
do i = 1, 3
print "(3f8.3)", (x(i,j), j=1,3)
enddo
end subroutine PRINTMAT
end program matmult

100

CHAPTER 4. FORTRAN

Note the use of the internal subroutine PRINTMAT to display the results.
This example performs the matrix multiplication exactly as one would in F77; it
does not take advantage of some of the new array capabilities of F90. In fact, there
is a built in function, MATMUL, in F90 that performs matrix multiplication. My
plan is to have more to say about these special F90 capabilities later in the class.
ASSIGNMENT F14
Modify the matmult program so that the matrix multiplication is done in a
subroutine. Add another subroutine that computes the transpose of a square matrix.
Compute the transpose of c and add this to the output listing.

4.15.1

Arrays of strings

As we have discussed, a string is just an array of characters. Thus, a 1-D array of


strings will be a 2-D character array. Here is an example of this:
program stringarray
implicit none
character (len = 80), dimension(10) :: name
name(1) = "Bill Clinton"
name(2) = "George Bush"
print *, "Our last president was ", trim(name(1))
print *, "Our current president is ", trim(name(2))
end program stringarray
The program can store up to 10 name of 80 characters each. Note the use of the
trim function when printing the names. This is very handle to cut off the trailing
blanks and limit unnecessary extra spaces (and sometimes lines) in the output.

4.16

A more complex example of data processing

Suppose we have a data file (e.g., datafile1) that contains a series of measurements
that has the following form:
980204 22:03:34.42
10.8
10.2
9.8
11.6
980205 08:45:22.20
5.5
7.2
6.6

4.16. A MORE COMPLEX EXAMPLE OF DATA PROCESSING

101

980205 20:13:42.88
15.0
13.8
12.9
14.2
etc.
The structure is that there are date/time identifiers and then a series of measurements made at that time. The problem in reading this file is that we dont know how
many measurement lines will follow each date/time line so we cant simply perform
a formatted read on each line from the file.
Let us assume that we want to compute the mean of these measurements and
write a new file which stores this information on the same line as the date/time. In
this case, our desired output file (e.g., datafile2) will look like:
980204 22:03:34.42
980205 08:45:22.20
980205 20:13:42.88

10.60
6.43
13.97

4
3
4

where the final column gives the number of measurements.


Here is how such a program could be written:
program procfile
implicit none
character (len=100) :: infile, outfile
character (len=20) :: linebuf, event
integer :: ios, nevent=0, ndata=0
real :: x, sum=0.0, xavg
print *, Enter input file name
read "(a)", infile
open (11, file=infile, status=old)
print *, Enter output file name
read "(a)", outfile
open (12, file=outfile)
do
read (11,(a), iostat = ios) linebuf
if (ios < 0) exit
if (linebuf(1:1) /= ) then ! no init blank, must be event
nevent = nevent + 1
if (nevent /= 1) then
!need to output previous event info
call WRITE_EVENT
ndata = 0
sum = 0
end if

102

CHAPTER 4. FORTRAN
event = linebuf
else
read (linebuf,*) x
sum = sum + x
ndata = ndata + 1
end if
enddo
call WRITE_EVENT
close (11)
close (12)

! has starting blank, must be measurement

contains
subroutine WRITE_EVENT
if (ndata /= 0) then
xavg = sum/real(ndata)
else
xavg = 0.
end if
write (12,"(a20, f7.2, i5)") event, xavg, ndata
end subroutine WRITE_EVENT
end program procfile
The beginning of the program where the files are opened is the same as in
fileinout.f90 (our earlier example of file I/O).
Using an infinite do loop, we read lines from the input file using:
read (11,(a), iostat = ios) linebuf
Because we dont know the format of each line before we read it, we read into a
character string (linebuf) that will serve to temporary store the contents of the line
(a line buffer as they say in the computer world).
We use iostat=ios to set ios to a flag that can be used to identify when the
end of the file occurs so that we can exit the do loop. As discussed previously, we
really should also check for an error condition (i.e., ios > 0), but we dont bother
here because errors are unlikely when reading in a string (yes, we are being a bit
sloppy!).
Next, we check the contents of the first character in linebuf. If is not a blank,
then we know we have an event line. In this case we increment the nevent counter
variable by one. If we have not just read the first event, then we have read a new
event, thus terminating the data input from the previous event. In this case we
need to write the processed event info to the output file (we cant do this until this

4.16. A MORE COMPLEX EXAMPLE OF DATA PROCESSING

103

point because we dont know how many data points there will be) and reset sum
and ndata to zero. Finally we set event = linebuf to store the event line until we
need to output it.
If, on the other hand, the first character in linebuf is blank, then we have a data
line. We then read the value of x out of linebuf:
read (linebuf,*) x
This is called an internal read and is the first time that we have shown this.
It allows the program to do a formatted read from a string, rather than from an
external file. We do a free format read (*), but one could also use a format specifier.
After reading the measurement value into x, we suitably increment a running sum
of x and the number of measurements.
When we reach the end of the input file, we have to output the results of the final
set of measurements. To avoid repeating a block of code for this operation, we use
an F90 internal subroutine (WRITE EVENT). This is used twice, inside the loop
as we output information from the last event before going on to the next one, and
outside the loop where we output the information from the last event. Note that we
must check to see if there are no measurements for an event line, in which case we
do not divide by zero but simply set xavg to zero. There should be no possibility
of mistaking this for a real measurement in the output file because the number of
measurements in this case will also be zero.
ASSIGNMENT F15
Obtain the data file, scsn.phase.dat, from the class web site. This is a list of
earthquakes and arrival time data from the Southern California Seismic Network
(SCSN) for the first two days of August 2000. (Files like this are readily obtainable
from the SCEC Data Center at http://www.scecdc.scec.org if you ever want to
obtain more of these data. The format is called the SCEC DC phase format.)
Here is what part of the file looks like:
2000/08/01 14:44:12.6 L 1.1
WRC VHZ
911
P IU1
WCS VHZ
885
P E 3
WVP VHZ
925
P IU1
TOW VHZ
813
P E 2
CLC VHZ
202
P E 2
WMF VHZ
906
P E 3

c 35.962 -117.693
4.41
1.26
9.98
1.85
11.54
2.25
18.28
3.78
18.49
3.37
22.86
3.94

5.2 A 9159024

51

104

CHAPTER 4. FORTRAN

WMF VHZ
906
S E 2 22.86
7.08
WNM VHZ
908
P E 3 23.74
4.23
2000/08/01 14:45:36.8 L 1.3 h 34.806 -116.271
CDY VHZ
184
P IU0
6.66
1.73
RAG VHZ 1034
P IU1 17.58
3.19
RAG VHZ 1034
S E 2 17.58
5.77
RMM VHZ
647
P ID1 37.23
6.33
TPM VHZ 1033
P IU1 52.29
8.88
GRP VHZ
355
P IU1 61.21
10.14
0299
C 65535
P IU1 11395.79
7.39
2000/08/01 15:04:17.9 L 1.4 h 34.810 -116.268
CDY VHZ
184
P IU0
6.76
1.73
RAG VHZ 1034
P IU1 17.76
3.16
etc.

7.3 A 9159025

79

7.1 A 9159030

19

83

There is a line for each earthquake with the date and time (UT not local time).
For the first line in the above example, the additional bits include:
L
= local event
1.1 = magnitude
c
= how magnitude was computed (coda in this case)
35.962
= lat
-117.693 = lon
5.2
= depth (km)
A
= SCSN assigned quality (A=best)
9159024 = cusp id number
8
= number of lines of phase arrival time info
The station info lines include:
WRC = station name
VHZ = station channel
911 = ?
P
= phase name (normally P or S)
IU1 = phase pick info
4.41 = station-event distance (km)
1.26 = travel time (s from event origin time)

(=X for assignment)


(=T for assignment)

Your task is to write a F90 program to read this file and write to an output
file a series of (X,T) points with (one X-T pair per line) for all P arrivals for all
events between 3 and 8 km depth. Then make an X-Y plot of the T-X points using
whatever plotting method you are most familiar with. That is, plot time on the
y-axis and distance on the x-axis and plot the coordinates of each point. E-mail me
a copy of the F90 program source code and a PDF file of the X-Y plot.
The good news about this data format is that it does include a number in the
event line that lists the number of phase lines that will follow. The bad news is
that sometimes the last 1 or 2 phase lines are garbage in that they have numbers

4.17. EXAMPLE SORTING ROUTINE FROM NUMERICAL RECIPES

105

instead of station names and are some kind of calibration info. You will have to figure
out how to recognize and discard these lines in your program. Also the phase lines
begin with a tab character rather than a long series of blanks; this may complicate
things somewhat.
Hint: Dont try to write your entire program all at once. First, get a version
working that simply reads the input file and prints out the parts of each line that
you will need (i.e., the depth and number of phase lines for the event line, the phase
name and the X and P values for the phase lines). Once you have this working
correctly, then go on to add the part to output the (X,T) points.

4.17

Example sorting routine from Numerical Recipes

The book Numerical Recipes contains a large number of useful subroutines (there
are both Fortran and C versions) that are fully explained in the text. You can find
F77 source code (1st edition of NR) for these routines in:
~shearer/PROG/SUBS/NUMRECIP
To illustrate the use of the Numerical Recipes routines, here is an example
program that generates a list of random numbers, sorts them using the NR function
piksrt, and then prints out the sorted list.
program sortrand
implicit none
integer, parameter :: NPTS=100
integer :: i
real, dimension(npts) :: xran
do i = 1, NPTS
call random_number(xran(i))
enddo
call PIKSRT(NPTS,xran)
print "(10f7.4)", (xran(i), i=1,npts)
end program sortrand
The program sets the xran array to random numbers. It then calls the NR
subroutine PIKSRT to sort the numbers before printing them out. We must either
include the PIKSRT source code or link with a PIKSRT object code when we compile
this program. Here is the PIKSRT source code:

106

11
10
12

CHAPTER 4. FORTRAN
SUBROUTINE PIKSRT(N,ARR)
DIMENSION ARR(N)
DO 12 J=2,N
A=ARR(J)
DO 11 I=J-1,1,-1
IF(ARR(I).LE.A)GO TO 10
ARR(I+1)=ARR(I)
CONTINUE
I=0
ARR(I+1)=A
CONTINUE
RETURN
END
This ugly code is typical of old Fortran programs. It has all capital letters and

has line numbers, go to statements, and uses continue statements in the do loops.
Actually, its better than many examples, because the do loops are indented. The
important thing for us is that the code workswe dont want to bother rewriting it
and possibly breaking it.
In general, F77 and older code will not compile under F90 because of old-style
comment lines (C in column 1) and old-style line continuation flags (non-& character in column 6). This example, however, does not have these problems and is
fully F90 compatible. Thus, we could just append the source code onto the end of
our program. Alternatively, we could compile the subroutine under F90 as follows:
f90 -c piksrt.f -o piksrt_f90.o
which will create a piksrt f90.o file (replace f90 with gfortran if you are using gfortran). The f90 is my own convention to label f77 source code that has been compiled
under f90 so that I dont try to link piksrt f90.o with a f77 program. Unfortunately,
f77 and f90 object files are not always compatible (they are in this case, but it is
dangerous to assume that they always will be, so it is safest to separately compile
the source code in each case).
f90 sortrand.f90 piksrt_f90.o -o sortrand
or by using a Makefile as discussed earlier. This would be our best choice if the
subroutine source code is lengthy and/or is non-F90 compatible.
ASSIGNMENT F16

4.18. EXAMPLE OF SAVING VALUES IN A SUBROUTINE

107

Modify the sortrand program to return the median of a user-specified number


of random numbers. Do the median computation in the form of a subroutine that
returns the median value, but does NOT resort the input array in the process.
The random number generation should be in your main program, not within the
MEDIAN subroutine which should be a general purpose routine that simply inputs
an array of numbers and returns the median value. Then whenever you need to
compute a median of some numbers within a program, you can simply call the same
subroutine, e.g.,
call MEDIAN(x, n, xmed)

!x=array, n=# of points, xmed=median

Hint: Within your median-computing subroutine, copy the input array to another
array before calling PIKSRT. Make sure that your program gives the correct result
for both even and odd numbers of points.

4.18

Example of saving values in a subroutine

A common convention in geophysics is to name an instrument site with a short


ascii string. In seismology, for example, station names are usually designated with
3 to 4 character names. Southern California GPS sites are usually identified with 4
character names.
Often data products are labeled by the station name, without including information about the station, such as its location. Data processing programs will often
require this information. Thus, there is frequently a need for a computer routine
that will retrieve information about a station, given the station name. One could
hardwire this information into the program with a series of if statements, but this
would become very cumbersome for large numbers of stations and would limit the
flexibility and portability of the code. A better approach is to save the station information in a file and have the computer read from the file to access the information
when necessary. However, file I/O is relatively slow so we wont want to open, read,
and close the file every time we need to determine a station location. It would be
much faster to read the station information file once and then save the information
during the time that the program is running. For portability, ideally all of this
overhead would be performed in a function that we could use in many programs
without having to worry about the format of the station file or the size of the array
that are required to store it.

108

CHAPTER 4. FORTRAN
In order to do this, we need to retain the values of certain variables within the

function for subsequent calls.


Here is an example that will return station coordinates for some of the GSN
(Global Seismic Network) stations.
program testgetstat
implicit none
character (len = 4) :: stname
real :: slat, slon, selev
do
print *, Enter station name (stop to stop)
read (*,(a)) stname
if (stname == stop) exit
call GET_STAT(stname,slat,slon,selev)
print *, slat,slon,selev = ,slat,slon,selev
enddo
end program testgetstat
! subroutine GET_STAT gets the lat/lon/elev of a station
! with a given name from the GSN station list
!
! Inputs: snam = station name (a4)
! Returns: flat = station latitude
!
flon = station longitude
!
felev = station elevation (m)
! NOTES: Station list must be alphabetized
!
If station name is not found, then returned
!
variables are set to -999.
!
Program reads and saves station info on first call
!
subroutine GET_STAT(snam,flat,flon,felev)
implicit none
integer, parameter :: NMAX=5000
character (len = 4) :: snam
character (len = 4), dimension(NMAX) :: stname
real, dimension(NMAX) :: slat, slon, selev
real :: flat, flon, felev
integer :: i, i1, i2, nsta, it
logical :: firstcall = .true.
save firstcall, stname, slat, slon, selev, nsta

if (firstcall) then
firstcall = .false.
open (11, file=stlist.gsn, status=old)
print *, Reading station file: stlist.gsn
do i=1,NMAX
read (11,7,end=12) stname(i), slat(i), slon(i), selev(i)
format (a4, f10.5, f11.5, f5.0)
enddo

4.18. EXAMPLE OF SAVING VALUES IN A SUBROUTINE

109

print *,***Warning: number of stations in file may exceed ,NMAX


i = NMAX + 1
12
nsta = i-1
close (11)
print *,Number of stations read = ,nsta
end if
i1 = 1
i2 = nsta
do it = 1, 15
i = (i1+i2)/2
if (snam == stname(i)) then
flat = slat(i)
flon = slon(i)
felev = selev(i)
return
else if (snam < stname(i)) then
i2 = i-1
else
i1 = i+1
end if
enddo
print *, ***station not found , snam
flat = -999.
flon = -999.
felev = -999.
end subroutine GET_STAT
The main program is a short driver program that enables us to test that the
GET STAT subroutine is working properly. It is always a good idea to use little
programs like this to test functions that are being developed, BEFORE using the
functions in larger programs.
For efficiency, the subroutine reads the station file only the first time the subroutine is called. It then saves the information in memory to use for subsequent
calls. Normally, the values of variables in a subroutine are not retained between
calls. The save command is used to specify those variables that are to be saved
between calls.
This is our first example of a logical variable:
logical :: firstcall = .true.
firstcall can have two possible values: .true. or .false. Here we set it to .true. the
first time (and only the first time) that the subroutine is called.
The save statement specifies which variable values are to be saved between subroutine calls:

110

CHAPTER 4. FORTRAN
save firstcall, stname, slat, slon, selev, nsta
To read the station file upon the first call to the subroutine, we write
if (firstcall) then

Notice that a logical variable can be the sole argument for an if test. We then set
firstcall = .false. so that this block of code will not be excuting upon subsequent
calls to the subroutine.
The station file has the form:
AAE
9.02920
38.76560 2442
AAK
42.63900
74.49400 1645
ABKT 37.93040
58.11890 678
ADK
51.88370 -176.68440 116
AFI -13.90930 -171.77730 706
ALE
82.50330 -62.35000
60
ALQ
34.94620 -106.45670 1840
ANMO 34.94620 -106.45670 1840
ANTO 39.86890
32.79359 883
AQU
42.35389
13.40500 720
.
.
In routines like this, I like to print out information about the file that is being
read during the first call. This reminds the user of the program about what is being
done and it helpful in case there is a problem or error later in the program. For
example, perhaps we have a faulty station file that has only 1 station. Or perhaps
the firstcall flag is not working properly, in which case these output lines will result
each time we call the routine, not just the first time.
For speed, the subroutine does not simply loop through the station list until it
finds a match. Instead, it exploits that fact that the stations are in alphabetical
order. i1 and i2 are indices that are designed to bracket the target station in the list.
Initially, they are set to 1 and nsta, the first and last station indices. Next, i is set
to a point halfway between i1 and i2. The station in the station list at i is compared
to the target station. If they are the same, then we can set the return variables. If
the stlist station is greater than the target station (later in the alphabet), then i2
is set to i-1. If the stlist station is less than the target station, then i1 is set to i+1.
We iterate using this procedure 15 times to narrow in on the station name. (2e15
must be greater than NMAX to make sure we do enough iterations)
Note that

4.19. COMPLEX NUMBERS

111

else if (snam < stname(i)) then


is a valid check to see if string variables are in alphabetical order.
If we dont find the target stname in the station list, then we print out a warning
message. We also set the station coordinates to numbers that are unlikely to be
confused with real station locations in case the user of the function does not notice
the warning message (or chooses to ignore it!).
ASSIGNMENT F17
Modify GET STAT to find the nearest station to a given (lat, lon) point and
return the station name and its coordinates. You can get the stlist.gsn file from
http://igppweb.ucsd.edu/ shearer/SIO233/. Name your new function NEAR STAT
and also include a testnearstat driver program to test the operation of the function.
To find the nearest station, you will need to be able to compute the distance between two points on a sphere. Use the SPH AZI from the notes or the SPH DIST
subroutine from shearer/SUBS/sphere subs.o
Dont bother to do the separate calculation for small values of del.
The NEAR STAT suboutine should be of the form:
subroutine NEAR_STAT(plat, plon, stname, slat, slon, sdep)
where plat and plon are the coordinates going into the subroutine and stname, slat,
slon and sdep are passed back from the function to the main program.

4.19

Complex numbers

A nice feature of Fortran is that complex numbers are a standard feature. Here is
a program to show how they can be declared and used:
program testcomplex
implicit none
complex :: a, b, c
print *, Enter first complex number
read *, a
print *, Enter second complex number
read *, b
c = a*b
print *, Product = , c
print *, abs of product = , abs(c)
print *, sqrt of product = , sqrt(c)
end program testcomplex

112

CHAPTER 4. FORTRAN

Here is an example of running the program:


rock% testcomplex
Enter first complex number
(1, 1)
Enter second complex number
(-.5, .1)
Product = (-0.6,-0.4)
abs of product = 0.7211103
sqrt of product = (0.2460795,-0.81274545)
Complex numbers are represented as a pair of numbers in parenthesis separated
by a comma (first number is the real part, second number is the imaginary part).
This is the only format that will work for the free format read. If you try to enter
or
or even

1, 1
1 1
(1 1)

you will get an error.


Notice that in F90 you dont need to use special forms for the functions abs and
sqrt (e.g., cabs and csqrt as you will see sometimes in older code).
There are conversion functions to build complex numbers from two real numbers
and vice versa, as shown in this example code:
program testcomplex2
implicit none
complex :: a
real :: ar, ai
print *, Enter real part
read *, ar
print *, Enter imaginary part
read *, ai
a = cmplx(ar, ai)
print *, a = ,a
a = exp(a)
print *, Exp(a) = , a
ar = real(a)
ai = aimag(a)
print *, Real part = , ar
print *, Imag part = , ai
end program testcomplex2
and its output
rock% testcomplex2
Enter real part

4.20. ARRAY OPERATIONS IN F90

113

1
Enter imaginary part
2
a = (1.0,2.0)
Exp(a) = (-1.1312044,2.4717266)
Real part = -1.1312044
Imag part = 2.4717266

ASSIGNMENT F18
Write a program to test the built in complex exponential function against the
formula
e^z = e^(x+iy) = e^x e^(iy) = (e^x cos y) + i (e^x sin y)
which you will have to adapt for your program. Include a driver program to test
your function for selected input values. Make sure that your code gets the correct
answer for these examples:
exp( 1.1 + 2.3i) = -2.002 + 2.240i
exp( 0.0 + 1.2i) = 0.362 + 0.932i
exp(-0.5 + 0.0i) = 0.607 + 0.000i
Can you find any differences between the built in exp function are the results
obtained with the formula?

4.20

Array operations in F90

F90 allows many operations on vectors and matrices to be performed in single


statements without the necessity of writing do loops over the array indices (as was
necessary in F77). Here is an example program that demonstrates some of these
operations on vectors.
program vectormath
implicit none
real, dimension(5) :: a = (/ 1.0, 2.0, 3.0, 4.0, 5.0 /), b = 2.0, c
real :: x
print
c = a
print
c = 2
print
c = a

*, a = , a
+ 1
*, a + 1 = , c
* a
*, 2 * a = , c
* a

114

CHAPTER 4. FORTRAN
print *, a * a = , c
c = sqrt(a)
print *, sqrt(a) = , c
c = sin(a)
print *, sin(a) = , c
c = exp(a)
print *, exp(a) = , c
print *, b = , b
c = a + b
print *, a + b = , c
c = a * b
print *, a * b = , c
x = sum(a)
print *, sum(a) = , x
c = a
c(4:5) = 0.0
print *, a with two zeros on end = , c
x = dot_product(a, b)
print *, a dot b = , x
x = sum( (a - sum(a)/5. )**2)
print *, sum of squares of difference from mean = , x

end program vectormath


Note that even fairly complicated expressions are possible provided the programmer keeps track of what is a vector and what is a scalar.
Operations are also possible on matrices. Here are some examples:
program matrixmath
implicit none
real, dimension(2, 3) :: a23, b23, c23
real, dimension(3, 2) :: a32, b32, c32
real, dimension(2,2) :: a22, b22, c22
integer, dimension(2) :: loc2
integer :: i, j, k
a23(1,1:3) = (/ -5.1, 3.8, 4.2 /)
a23(2,1:3) = (/ 9.7, 1.3, -1.3 /)
print *, Matrix a23 follows
call PRINTMAT(a23, 2, 3)
b32(1:3, 1) = (/ 9.4, -6.2, 0.5 /)
b32(1:3, 2) = (/ -5.1, 3.3, -2.2 /)
print *, Matrix b32 follows
call PRINTMAT(b32, 3, 2)
c22 = matmul(a23, b32)

!this works but matmul(b32, a23) does not

print *, "Matrix c22 = matmul(a,b) follows"


call PRINTMAT(c22, 2, 2)

4.20. ARRAY OPERATIONS IN F90

115

print *, maxval(a23) = , maxval(a23)


print *, maxloc(a23) = , maxloc(a23)
loc2 = maxloc(a23)
print *, loc2 = , loc2
print *, a23(loc2(1), loc2(2)) = , a23(loc2(1), loc2(2))
b23 = a23 + transpose(b32)
print *, Matrix b23 = a32 + transpose(b32) follows
call PRINTMAT(b23, 2, 3)
print *, size(a23) = , size(a23)
print *, size(a23(1,:)) = , size(a23(1,:))
contains
subroutine PRINTMAT(x, m, n)
real, dimension(m, n) :: x
integer :: m, n
do i = 1, m
print "(10f8.3)", (x(i,j), j=1,n)
enddo
end subroutine PRINTMAT
end program matrixmath
This demonstrates the intrinsic functions matmul, maxval, maxloc, transpose,
and size. Note that matmul(a,b) is true matrix multiplication, not the multiplication
of the individual array elements as occurs from writing simply a*b.
The maxloc function returns the location of the maximum value in the array.
Note that the statement
loc2 = maxloc(a23)
works only if loc2 is defined as an integer array with dimension(2).
NOTE: If maxloc is applied to a one-dimensional array, for example to find the
index of the maximum value of a vector bvec:
loc = maxloc(a)
Note that loc must be defined as an integer array with dimension(1). You will get
an error message during compilation if loc is defined simply as an integer.
There are corresponding functions, minval and minloc, that determine the minimum value and location within an array.

116

CHAPTER 4. FORTRAN
It would be interesting to test whether programs using the intrinsic array func-

tions in F90 run any faster than those written using do loops.

4.21

Allocatable arrays

An awkward aspect of older Fortran programs is the need to declare arrays to the
maximum possible size that could ever be needed when running the program. This
memory must be set aside even when it is not neededa wasteful practice. F90
avoids this by allowing memory to be allocated and deallocated on the fly. Here is
an example program:
program setarray
implicit none
real, dimension(:), allocatable :: x
real, dimension(:, :), allocatable :: y, yy
real :: xsum
integer :: n, i, j, m
print *, Enter number of points
read *, n
allocate (x(n))
do i = 1, n
call random_number(x(i))
enddo
xsum = sum(x)
print *, sum = , xsum
deallocate(x)
!this frees up memory
print *, Enter m, n (# rows, # columns)
read *, m, n
allocate (y(m,n))
allocate (yy(n,n))
do i = 1, m
print *, Enter row # , i
read *, (y(i, j), j = 1, n)
enddo
yy = matmul(y, transpose(y))
print *, y * yt =
call PRINTMAT(yy, m, m)
deallocate(y)
deallocate(yy)
contains
subroutine PRINTMAT(x, m, n)
real, dimension(m, n) :: x

4.22. STRUCTURES IN F90

117

integer :: m, n
do i = 1, m
print "(10f8.3)", (x(i,j), j=1,n)
enddo
end subroutine PRINTMAT
end program setarray

4.22

Structures in F90

It is often convenient to have a user-defined array of data elements that may, or may
not, be of the same data type. This is called a Structure in C and a Derived
Data Type in F90. This is a new feature that was not included in F77. Suppose,
for example, we wanted to set up a data base containing information (name, age,
height, weight) about 3 different people. Here is a program that uses a structure to
read in this information and print the name of the lightest person:
program namebase
implicit none
integer, parameter :: nmax=3
type person
character (len=20) :: name
integer :: age
real :: height, weight
end type person
type(person), dimension(nmax) :: student
integer :: i
real :: weightmin
do i = 1, nmax
print *, Enter first name, age, height, weight (e.g., Bob 27 69 183)
read *, student(i)%name,
&
student(i)%age,
&
student(i)%height,
&
student(i)%weight
enddo
print *, Lightest student = , student(minloc(student%weight))%name
print *, Oldest student = , student(maxloc(student%age))%name
end program namebase
The statements:
type person
character (len=20) :: name

118

CHAPTER 4. FORTRAN
integer :: age
real :: height, weight
end type person

create a defined type that can be used to name a variable later in the program.
The contents are a character string (of length 20), an integer for the age, and real
number for the height and weight. These contents are termed the members or
components of the structure.
In this example, the defined type is given the name person which is NOT a
variable name. Rather it is a name that can later to used to define the actual variable
name(s) that will have this structure. The next line creates the array student that
has this structure:
type(person), dimension(nmax) :: student
In this line, type(person) acts just like the integer and real statements that we
are used to. It can be used to define more than one variable name and these variables
do not need to be arrays. Here is another example of this:
type(person), dimension(nmax) :: grads, undergrads
type(person) :: teacher
which sets up structure arrays for grads and undergrads and a single structure for
teacher.
The members of the structure are referenced by appending %member to the
name, where member refers to the specific member defined in the structure. So, for
example, student(1)%age is the age of student(1). Note that the array index goes
BEFORE the %member, not after. The % operator is also sometimes called the
component selector.
Note the nifty use of the minloc and maxloc functions, avoiding the need to write
do loops.
This example does not really make obvious the advantages of a structure because
we could easily have maintained separate arrays for name, age, height and weight.
A clearer advantage of the structure approach is that we can reassign all of the
members with a single statement, i.e., we can say:
student(3) = student(2)
rather than having to say:

4.23. WRITING FAST PROGRAMS

119

student(3)%name = student(2)%name
student(3)%age = student(2)%age
etc.
However, we cannot compare structures as in the statement:
if (student(3) == student(2) ) then

4.23

ILLEGAL!!

Writing fast programs

Modern computers are amazingly fast compared to those from 10 to 20 years ago
and thus even inefficient code will often run fast enough. But any code that takes
more than a few seconds to run is potentially more useful if it could run faster. So
its always good to think about ways to make code more efficient.
In most cases, there is one key part of the program that takes most of the time.
The first step is to identify that part. If its not obvious by inspection, then one can
add print statements that keep track of how long the different parts take to run. In
F90, this can be done using the builtin function system clock, as illustrated in this
example program that tests how long different types of math take:
program testspeed
implicit none
integer :: n, i, count1, count2, count_rate, k=2, k2
real :: dt, x=1.2, x2
print *, Enter number of operations
read *, n
! test integer math
call system_clock(count1, count_rate)
k2 = 5
do i = 1, n
k = k + k2
k = k - k2
enddo
call system_clock(count2, count_rate)
dt = real(count2-count1)/real(count_rate)
print *, integer +-, dt = , dt
call system_clock(count1, count_rate)
k2 = 5
do i = 1, n
k = k*k2
k = k/k2
enddo

120

CHAPTER 4. FORTRAN
call system_clock(count2, count_rate)
dt = real(count2-count1)/real(count_rate)
print *, integer */, dt = , dt

! test real math


call system_clock(count1, count_rate)
x2 = 3.14159
do i = 1, n
x = x + x2
x = x - x2
enddo
call system_clock(count2, count_rate)
dt = real(count2-count1)/real(count_rate)
print *, real +-, dt = , dt
call system_clock(count1, count_rate)
x2 = 3.14159
do i = 1, n
x = x*x2
x = x/x2
enddo
call system_clock(count2, count_rate)
dt = real(count2-count1)/real(count_rate)
print *, real */, dt = , dt
! test trig functions
call system_clock(count1, count_rate)
x2 = 5.14159
do i = 1, n
x = sin(x2)
x = cos(x2)
enddo
call system_clock(count2, count_rate)
dt = real(count2-count1)/real(count_rate)
print *, sin/cos, dt = , dt
! test if statement
call system_clock(count1, count_rate)
k2 = 5
do i = 1, n
k = k + k2
k = k - k2
if (k > 2) k = k - k2
enddo
call system_clock(count2, count_rate)
dt = real(count2-count1)/real(count_rate)
print *, integer +- with if statement, dt = , dt
end program testspeed

On my Macbook Air (2 GHz Intel Core i7), this produces:

4.23. WRITING FAST PROGRAMS


shearer@khan 84> ./testspeed
Enter number of operations
99999999
integer +-, dt =
0.44499999
integer */, dt =
1.1700000
real +-, dt =
0.57099998
real */, dt =
0.95999998
sin/cos, dt =
1.7980000
integer +- with if statement, dt =

121

0.44499999

Before we continue, we should stop for a minute to be amazed. My little laptop


is doing 200 million operations in about a second or less!
Somewhat surprisingly, real arithmetic is comparable to integer arithmetic and
sometimes even faster. Even more surprisingly, the trig functions only take about a
factor of two longer. And it appears that if statements take hardly any time at all!
Thus much of the advice that I was prepared to give you (try to use integers instead
of reals, avoid computing trig functions and using if statements inside do loops) is
not accurate. Some really smart people must be optimizing computer chips and the
F90 compiler. But its still worth knowing how to identify the slow part of your
code, so that you can devise ways to make it run faster.

4.23.1

The -O option

But there is another way to make your code run faster and its always surprising
to me how many students dont know about it. Most modern computer compilers
have an option to optimize code to make it run faster. In gfortran, this is done
using the -O option, i.e.,
gfortran -O testspeed.f90 -o testspeed
Here is what I get for this program after optimization:
shearer@khan 86> ./testspeed
Enter number of operations
99999999
integer +-, dt =
3.50000001E-02
integer */, dt =
0.22499999
real +-, dt =
0.19100000
real */, dt =
0.63099998
sin/cos, dt =
3.20000015E-02
integer +- with if statement, dt =

0.12400000

122

CHAPTER 4. FORTRAN
The integer add/subtract is 13 times faster, the integer mult/divide is 5 times

faster, the real add/subtract is 3 times faster, the real mult/divide is 1.5 times faster,
and the sin/cos is an astonishing 560 times faster1 .
Since -O makes code run faster, why not use it all the time? It takes longer to
compile programs using -O and the executables are not as stable, i.e., your program
is more likely to crash or give the wrong answer (!). Thus, I recommend that you
debug your programs without using -O, but then once they are stable and working,
use -O to improve their speed (if necessary).
ASSIGNMENT F19
Compile and run the program testspeed.f90, compiled both with and without
the -O option. Send me the output results for 99,999,999 operations, along with the
details of your computer (i.e., name, type, chip, clock speed). For example, on Macs
go to About This Mac under the Apple menu and send me the processor details.

4.24

Fast I/O in Fortran

Unless you are a serious number cruncher you are likely to find that input/output
operations (reading and writing to file) take more time than any arithmetic operations. FORTRAN has generally not been considered to be as flexible as C in the
way that it handles i/o operations. However, with a little deviousness, it is possible
to write FORTRAN programs that are capable of extremely fast i/o. The key to
making FORTRAN do fast i/o is to use unformatted binary read/write operations.
Suppose we have a 100000 10 real array that we wish to store on disk. We
could output this simply as an ascii file:
program testio1
implicit none
real, dimension(100000,10) :: a = 1.1
integer :: i, j, count1, count2, count_rate
real :: dt
open (12, file=out.testio1)
call system_clock(count1, count_rate)
do i = 1, 100000
write (12, (10e12.4)) (a(i,j), j=1,10)
enddo
1

Does it recognize that we are computing the same thing? I suspect so!

4.24. FAST I/O IN FORTRAN

123

call system_clock(count2, count_rate)


dt = real(count2-count1)/real(count_rate)
print *, dt = ,dt
close (12)
end program testio1
Notice the use of the intrinsic function, system clock, to monitor how long the output
takes. On my laptop (Macbook Air, 2 GHz Intel Core i7), this takes about 0.7 s
and creates a 12.1 Mb file.
Next, we could output this as a binary file using similar loops over the indices:
open (12, file=out.testio2, form=unformatted)
do i = 1, 100000
write (12) (a(i,j), j=1,10)
enddo
close (12)
On my laptop, this takes about 0.067 s (10 times faster) and generates a 4.8 Mb
file. A further improvement occurs when we store the entire array directly:
open (12,file=out.testio3,form=unformatted)
write (12) a
close (12)
On my laptop, this generates a 4.1 Mb file in about 0.006 s. The total speed
improvement compared to ASCII is about a factor of 120.
Similar speed improvements can be noted upon reading these files. The message
is that for speedy i/o of large data sets in FORTRAN, we should read/write large
arrays in blocks. This is how Guys GFS programs work and my routines to store
travel time data.
Of course, it is not always convenient to refer to data within a big array. To
make it easier to keep track of things, it is often a good idea to define a data type
(structure). For example, suppose our array actually consists of 10000 lines of
data with 10 values for each line, consisting, for example, of a 4 character station
name, coordinates for the station, and date/time information. In this case, we can
define a suitable data type and then create an array of these data types.
type station_info
character (len=4) :: stname
real :: slat, slon, selev, sec

124

CHAPTER 4. FORTRAN
integer :: iyr, imon,iday, ihr, imn
end type station_info
type(station_info), dimension(10000) :: stlist
We can now perform i/o very simply using the stlist array, while preserving

the ability to access its individual parts. For example, stlist(5633)%stname is the
station name of the 5633th line of data, stlist(232)%slat is the latitude of the 232th
line, etc.
F77 does not allow for structures, but a kludge is possible using the equivalence
statement. For example, we could we define a small array aa with 10 elements and
set these elements to be equal to the individual variable names that we want:
real aa(10)
common/com1/stname,slat,slon,selev,iyr,
&
imon,iday,ihr,imn,sec
equivalence (aa(1),stname)
The equivalence statement sets the values of aa to the same place in memory
as the 10 variable names defined in the common block. The beauty of this is that
these variables can be of mixed data types. In order to examine the record 523, we
just set aa to the appropriate part of a:

50

i=523
do 50 j=1,10
aa(j)=a(i,j)
continue

The stname, slat, etc., are now set. This trick is used in Guys GFS routines and in
many of my older I/O routines.

4.24.1

Ascii versus binary files

There is a tradeoff between speed and convenience for ascii versus binary files. ascii
is easier to work with and to understand and is often the preferred format when file
size or speed is not a problem. However, for large data sets and data processing,
binary is often better because it permits much faster I/O and the files will generally
be smaller than their ascii counterparts. You are also assured of retaining the full
machine precision for numerical values (you dont risk deciding later that you really
should have stored 4 places to the right of the decimal, not 3). However, binary
formats are somewhat less portable then ascii and require more user knowledge
about their format.

4.24. FAST I/O IN FORTRAN

125

ASSIGNMENT F20
Compile and run the program testio.f90 on your computer and send me the
timing results for the three different ways to perform the I/O, along with the details
of your computer (name, type, chip, clock speed). Then try it again after compiling
with the -O option. Does -O help with I/O speed?

126

CHAPTER 4. FORTRAN

Chapter 5

Fun programs
Lets be honestthe examples in most programming books are pretty boring. Devising efficient sorting algorithms may be of academic interest, but these programs
are pretty far from the science that most of us do. But the only way to get good
at programming is to write lots of programs. It doesnt really matter what they do,
as long as they do something. So to help us stay motivated, this chapter contains
examples of programs that are fun to write and fun to run.

5.1

Tic-tac-toe
...nearly all the people I take down there have precisely the same response
to the prospect of playing ticktacktoe with a chicken. After looking the
situation over, they say, The chicken gets to go first!
But shes a chicken, I say. Youre a human being. Surely there should
be some advantage in that.
Some of my guests, I always report with some embarrassment, dont stop
there. Some of them say, The chicken plays every day. I havent played
in years.
Calvin Trillin, The New Yorker, Feb. 8, 1999

As a goal for some of our F90 programming exercises, we are going to write a
program to play tic-tac-toe. Although tic-tac-toe is a fully solved game in the sense
that it is well known that a draw always results if both players play optimally (indeed
even chickens have been trained to play the game), it is of sufficient complexity that
writing a working program may seem a daunting task to the beginning programmer.
127

128

CHAPTER 5. FUN PROGRAMS

As is usually the case, however, the problem can be made more tractable by splitting
the task up into smaller pieces.
Let us consider how we are going to approach this problem. Clearly we will need
a way to store the current state of the board (who has moved where), some kind of
flag for who is x and who is oh, some kind of flag for who goes first, etc. We will
also need a way to display the board and to input moves from the human player.
There are obviously many, many different ways to write this program. However,
to make sure that our programs will readily be able to play each other, it will be
good to agree on some standardization of the different components.
First, let us assume that the spaces on the board are defined by the numbers
1-9, e.g.,
1 | 2 | 3
---+---+--4 | 5 | 6
---+---+--7 | 8 | 9
Thus, when the program asks the human for a move, the human will enter a
number from 1 to 9. For convenience, lets use the same convention for storing the
moves within the program. Define an array called board that will store the moves
that have been made:
integer, dimension(9) :: board
Now, let us specify that board will have the value 0 if the space is still open, 1 if
the human has moved in the space, and 2 if the computer has moved in the space.
Obviously there are other choices that we could have made for this (1 for x, 2 for oh;
1 for 1st player, 2 for 2nd player), but this makes the most sense to me because the
human vs. computer difference is fundamental to the program. x vs. oh is simply
a naming convention; 1st vs. 2nd will simply determine where the program starts.
Let us begin by writing an external subroutine that will print to the screen
the current state of the board. We will need to input to the subroutine the board
variable and a variable to define which player is x in the game. In other words, this
subroutine might have the form:
subroutine printboard(board, playerx)

5.1. TIC-TAC-TOE

129

We assume that playerx = 1 if the human is x, playerx = 2 if the computer is x.


ASSIGNMENT TTT1
Write the external subroutine printboard, together with a test main program to
check if it is working properly. Here are some examples:
board = (/ 1, 2, 1,
&
0, 2, 0,
&
0, 0, 0 /)
playerx = 1
printboard should produce:
x | o | x
---+---+--| o |
---+---+--|
|
board = (/ 0, 0, 0,
&
0, 2, 1,
&
0, 0, 2 /)
playerx = 2
printboard should produce:
|
|
---+---+--| x | o
---+---+--|
| x
Note that in these examples board is a 1-D array with nine elements. The & line
continuation characters are used only to make the mapping to the 3x3 board more
obvious. Make sure that you do not change any of the values in the board array or
that of playerx. You only want to print out the current state of the board.
Hint: Define a character array cboard(9) within your subroutine, the elements
of which you set to , x, or o, depending upon the contents of board and playerx.
Then write a series of print statments to do the actual output.

Next, our program will need some way to determine if a proposed move is a valid
move, i.e., if there is currently a blank space in the move location. To implement
this, let us write a function of the form:

130

CHAPTER 5. FUN PROGRAMS

integer function checkmove(board, move)


which will return 1 if the move is valid, 0 otherwise. Note that board is the array
defined above and move is the proposed move.
ASSIGNMENT TTT2
Write the function checkmove, together with a test main program to check if it
is working properly. To make the function robust, return zero if move is outside of
the range 1 to 9. Make sure you do not change any of the values in the board array
or of the move variable. Also, do NOT print out anything within this function to
warn of an invalid move. We may want to use this function for multiple purposes
in the program and we dont necssarily want it to print anything out. If a warning
is necessary, the calling program can print out the message.

Next, we need a function that will prompt the human for a move. It should
check to see if the move is valid and prompt the human again in the cases of invalid
moves. Lets call this function readmove:
integer function readmove(board)
which will return a valid move as entered by the keyboard.
ASSIGNMENT TTT3
Write the external function readmove, together with a test main program to
check if it is working properly. The function should ask the user to enter a move,
then use the checkmove function to check whether it is valid. If it is valid, return the
move to the main program. If it is invalid, then alert the user and prompt again for
a valid move. Repeat until a valid move is entered. IMPORTANT: Do not change
the board array within the readmove function. That should be done later in the
main program.

Next, we need a function to determine if there is a current winner (3 in a row)


or if the game is drawn (all spaces filled, no winner). This function will have the
form:
integer function checkwinner(board)

5.1. TIC-TAC-TOE

131

and will return 1 if player 1 has won, 2 if player 2 has won, 3 if the game is drawn,
and 0 otherwise.
ASSIGNMENT TTT4
Write the external function checkwinner, together with a test program to check
if it is working properly. Make sure you do not change the board array.
Hint: Define an 2-D integer array that stores the locations of the eight possible
ways to win, e.g.,
integer, dimension(8,3)
win(1,1:3) = (/ 1, 2, 3
win(2,1:3) = (/ 4, 5, 6
win(3,1:3) = (/ 7, 8, 9
win(4,1:3) = (/ 1, 4, 7
.
.
etc.

:: win
/)
/)
/)
/)

Then loop over these 8 possibilities and check to see if the 3 board values are all
ones or all twos. If there is no winner, then check to see if all board locations are
full (the draw criteria).

The most challenging part of your TTT program will be determining the best
move for the computer to make. We will implement this as a function:
integer function findmove(board)
which will return the best move for the computer (player 2) to make. Testing
this function will be easiest once the complete program is finished. To complete
the program, however, one needs a working version of findmove. To get around this
problem, lets write findmove in two stages. First, lets write a very stupid version
that will simply return a valid move. Then, after we have the complete program
working, we can make improvements to findmove.
ASSIGNMENT TTT5
Write a working version of findmove, together with a test main program to check
if it is working properly. Important: Do NOT change the board array within the
subroutine; that should be done only within the main program. Your subroutine
should not use any arguments other than board, that is playerx is not needed and

132

CHAPTER 5. FUN PROGRAMS

should not be used, nor should your findmove subroutine need to know who is going
first. Of course, you can figure out if you are making the first move within find move
simply by checking to see if the board is empty. Hint: Just loop through the moves
1 to 9 until you find an empty space (board(i) = 0). For a slightly more interesting
stupid program, pick a random valid move.

Now we are ready to put these pieces together into a main program that will
actually play the game. This program should ask the user if he/she wants to go
first and if he/she wants x or oh (the latter is the playerx flag in the printboard
function). The board array should be initialized to zero. For each move, the program
will have to determine whose move it is (a function of the move number and who
went first) and then either prompt the user or use the findmove function to pick a
move. Following the move, the program should check for a win or draw. If there is
a win or draw, a suitable message should be printed and the program stopped. If
there is not a winner/draw, then the program should continue with the next move.
ASSIGNMENT TTT6
Construct a working version of the complete tic-tac-toe program.

Finally, we will want to improve on our initial findmove function to make the
program smarter. An obvious strategy for this is to first check to see if any winning
moves are possible. If not, then next check to see if there is a winning move for the
human that should be blocked. Beyond this, youre on your own!
ASSIGNMENT TTT7
Refine your program to incorporate a smart findmove function. Send me the
complete program, including all functions in an e-mail. I will check the operation of
your program and also attempt to have the programs play each other. To facilitate
the latter goal, please give your findmove function the name:
function findmove_lastname(board)
where lastname is your last name. This will make it easier for me to test your
routines all at once. Please only include board in the argument list to findmove
this function should not need to know who is X or who went first. Please also make
sure that you do not modify the board array within findmove.

5.2. FRACTALS

5.2

133

Fractals

Generating beautiful images of fractals, such as the Mandelbrot set, is surprisingly


easy on the computer and provides good practice in using complex numbers. The
basic idea behind most fractal calculations is that there are certain complex functions
that, when computed over and over again, will either diverge or remain bounded.
Whether or not they diverge turns out to be extremely sensitive to small changes
in the initial complex value that starts the calculation, at least in certain regions of
the complex plane.
The most famous of all fractal images is the Mandelbrot set. To generate the
Mandelbrot set, we begin by considering a complex number, c. We then apply the
following algorithm:
set z = 0 to start
then repeatedly compute z = z*z + c
until |z| > 2 OR the number of iterations exceeds some threshold
then output the number of iterations
For example, if c = 0.3 + 0.3i then
1st
2nd
3rd
4th

iteration:
iteration:
iteration:
iteration:

z
z
z
z

= 0.30
= 0.30
= 0.16
= -0.02

+
+
+
+

0.30i
0.48i
0.59i
0.49i

|z|
|z|
|z|
|z|

=
=
=
=

0.42
0.57
0.61
0.49

In this case, z will remain bounded even after an infinite number of iterations.
However, if c = 0.5 + 1.0i then
1st
2nd
3rd
4th
5th

iteration:
iteration:
iteration:
iteration:
iteration:

z
z
z
z
z

=
=
=
=
=

0.50 + 1.00i
-0.25 + 2.00i
-3.44 + 0.00i
12.32 + 1.00i
151.19 + 25.63i

|z|
|z|
|z|
|z|
|z|

= 1.1
= 2.0
= 3.4
= 12.4
= 153.4

and the size of z explodes toward infinite values. By the 10th iteration it will exceed
the floating point range of most computers. However, we can stop computing z
once its absolute value (the complex absolute value or modulus is defined as the
distance of a point in the complex plane from the origin, i.e. z = abs(a+bi) =
sqrt(a*a+b*b) ) exceeds 2 because it can be shown that divergence is guaranteed
at this point.

134

CHAPTER 5. FUN PROGRAMS


We repeat this calulation for a series of different values of c and plot the resulting

output as a function of the location of c on the complex plane. The Mandelbrot set
is the set of all complex numbers c for which the size of z 2 + c is finite even after an
infinite number of iterations. We may obtain a good approximation, however, by
checking after some large number of iterations (1000 is a common choice). We also
may take advantage of the fact that it is known that z 2 + c will diverge whenever
|z| > 2.
Here is an example F90 program that performs this calculation for user-specified
parts of the complex plane and outputs a binary file that can be read and plotted
using MatLab or Python:
program mandel
implicit none
real, dimension(:), allocatable :: dat
real :: x1, x2, y1, y2, dx, dy, cr, ci
integer :: nx, ny, ix, iy, it, idat, ndat, nbytes, status
integer, dimension(2) :: nxny
complex :: c, z, z2
real, dimension(4) :: wind
print *, Enter x1, x2, y1, y2, nx, ny
read *, x1, x2, y1, y2, nx, ny
dx = (x2 - x1) / real(nx)
dy = (y2 - y1) / real(ny)
allocate(dat(nx*ny))
idat=0
do ix = 1, nx
do iy = 1, ny
cr = x1 + dx/2. + dx*real(ix-1)
ci = y1 + dy/2. + dy*real(iy-1)
c = cmplx(cr, ci)
z = cmplx(0.0, 0.0)
do it = 1, 1000
z = c + z*z
if (cabs(z) > 2) exit
enddo
idat = idat + 1
dat(idat) = it
enddo
enddo
print *, Finished calculation
ndat = idat
print *, ndat = ,ndat

5.2. FRACTALS

135

open (12, file=fractal.bin, form=unformatted)


nxny(1) = nx
nxny(2) = ny
write (12) nxny
wind(1) = x1
wind(2) = x2
wind(3) = y1
wind(4) = y2
write (12) wind
write (12) dat
end program mandel
The program computes the result for a grid of points in the complex plane,
defined by the user-defined limits x1 and x2 for the real part and y1 and y2 for the
complex part. The number of points computed in x and y are input as nx and ny.
This will determine the resolution of the plot. For example, nx=50 and ny=50 will
result in an image 50 pixels wide by 50 pixels tall. Although the image is really
two-dimensional, I store the results in the 1-D array, dat, because this simplies the
output of the results. Notice how allocate is used to set dat to dimension(nx*ny)
on the fly.
The spacing in x and y are defined by:
dx = (x2 - x1) / real(nx)
dy = (y2 - y1) / real(ny)
Note the conversion of the integer variables nx and ny to floats prior to performing
the division.
Here are the main loops that actually do the calculation:
idat=0
do ix = 1, nx
do iy = 1, ny
cr = x1 + dx/2. + dx*real(ix-1)
ci = y1 + dy/2. + dy*real(iy-1)
c = cmplx(cr, ci)
z = cmplx(0.0, 0.0)
do it = 1, 1000
z = c + z*z
if (cabs(z) > 2) exit
enddo
idat = idat + 1
dat(idat) = it
enddo
enddo

136

CHAPTER 5. FUN PROGRAMS

The real and imaginary parts of c are set to the center of the pixel with sides dx
and dy. The calculation is done for a maximum of 1000 iterations. The calculation
is halted if |z| > 2 by exiting the for loop. The iteration number is saved in the dat
array, using idat as a counter.
The next step is to save this information to a file. For speed in the case of a
large dat array, we use a binary file. We create and open the file with
open (12, file=fractal.bin, form=unformatted)
We use form=unformatted to specify that this is a binary file, not an ascii file.
Before writing the dat array to the file, we write some header information that
will help MatLab or Python to know the size of the array and the values of x1, x2,
y1, and y2. Writing to a binary file in Fortran is done using the write statement
without a format. We set up the 2-element int array, nxny, to handle the output of
nx and ny:
nxny(1) = nx
nxny(2) = ny
write (12) nxny
and we set up a 4-element float array, wind, to handle the output of x1, etc.
wind(1) = x1
wind(2) = x2
wind(3) = y1
wind(4) = y2
write (12) wind
Finally, we write the output array to the file
write (2) dat
For this program, it is a good idea to compile with the -O compiler option to
make it run faster:
gfortran -O mandel.f90 -o mandel
The Mandelbrot set is approximated in our program by places on the complex
plane where the output value (contained in the dat array) is 1000, i.e., the z function
has not diverged even after 1000 iterations. Output values less than 1000 provide a
message of how quickly the function diverges (small values indicate rapid divergence,
larger values indicate slower divergence). Plots of the output usually use colors to
indicate the different output values.

5.2. FRACTALS

5.2.1

137

Plotting using MatLab

Here is how to plot our output using Matlab:


clf reset
clear
[fid, message] = fopen(fractal.bin)
% Fortran I/O has extra four bytes at beginning and end of writes
[dum, count] = fread(fid, [1], int);
[size, count] = fread(fid, [2], int);
[dum, count] = fread(fid, [1], int);
nx = size(1)
ny = size(2)
[dum, count] = fread(fid, [1], int);
[wind, count] = fread(fid, [4], float)
[dum, count] = fread(fid, [1], int);
x1
x2
y1
y2

=
=
=
=

wind(1)
wind(2)
wind(3)
wind(4)

[dum, count] = fread(fid, [1], int);


[z, count] = fread(fid, [nx,ny], int);
fclose(fid);
z2 = log10(z);
axis( [x1, x2, y1, y2] ); axis(square); hold on;
imagesc( [x1, x2], [y1, y2], z2 );
colorbar;
One detail is that Fortran (unlike C) writes an extra 4 bytes before and after
each of the binary writes. So we need to read these extra btyes with our Matlab
script into a dummy variable or we will not be properly aligned with the data fields.
We set the output array to z and use imagesc to plot the result using the default
color scheme. We plot log(z) rather than z because it shows some of the details a
little better.

5.2.2

Plotting using Python

Here is one way to plot our output using Python, a script called plotfractal.py
#! /usr/bin/env python
import numpy
import array

138

CHAPTER 5. FUN PROGRAMS

import matplotlib
matplotlib.use("TkAgg")
import pylab

#set "backend" so plot appears on screen

nxny = array.array(l)
wind = array.array(f)
dat = array.array(f)
file = open(fractal.bin, mode=rb)
b1 = nxny.fromfile(file, 4)
nx = nxny[1]
ny = nxny[2]
print nx, ny = , nx, ny

#open in binary mode

#deprecated .read does same thing as .fromfile


#note that nxny goes from 0 to 3, we exclude ends

b2 = wind.fromfile(file, 6)
x1 = wind[1]
x2 = wind[2]
y1 = wind[3]
y2 = wind[4]
print x1,x2,y1,y2 =, x1, x2, y1, y2
b3 = dat.fromfile(file, 1 + nxny[1]*nxny[2])
print dat[1] to dat[4] = , dat[1], dat[2], dat[3], dat[4]
dx = (x2 - x1)/nx
dy = (y2 - y1)/ny
x = numpy.arange(x1 + dx/2., x2, dx)
y = numpy.arange(y1 + dy/2., y2, dy)
X,Y = pylab.meshgrid(x,y)
z = X + Y
k = 0
for iy in range(0, ny):
for ix in range(0, nx):
k = k + 1
z[ix, iy] = dat[k]
zlog = numpy.log10(z)

#only goes to ny-1!


#only goes to nx-1!

#plots look better if we take log

pylab.imshow(zlog,interpolation=bilinear,\
extent=(x1,x2,y1,y2), \
cmap=pylab.cm.Spectral) # cm is color map
pylab.colorbar()
pylab.axis(equal)
pylab.show()

(nice choices: Accent, Spectral, Reds)

Because Im not a Python expert, there are probably better ways to do this, but

5.2. FRACTALS

139

this is what I have figured out so far.


We follow Lisas example for plotting the gravity anomaly and import matplotlib.
We also import array which helps in setting up the binary input format.
The following statements set up the format for the three arrays in the file, where
l is for reading long (4-btye) integers and f is for reading reals.
nxny = array.array(l)
wind = array.array(f)
dat = array.array(f)
The following statement opens up the binary file fractal.bin for reading in binary
mode:
file = open(fractal.bin, mode=rb)

#open in binary mode

Next, we read into the nxny array:


b1 = nxny.fromfile(file, 4)
This array has 4 values, but the first and last are discarded because Fortran writes
an extra 4 bytes before and after every binary write. Because the array index goes
from 0 to 3, we assign the middle two values as:
nx = nxny[1]
ny = nxny[2]
In a similar manner, we read in and assign the wind array and the x1, x2, y1,
y2 values:
b2
x1
x2
y1
y2

=
=
=
=
=

wind.fromfile(file, 6)
wind[1]
wind[2]
wind[3]
wind[4]

Knowing the values of nx and ny, we read in the appropriate sized dat array:
b3 = dat.fromfile(file, 1 + nxny[1]*nxny[2])
Note that b1, b2, and b3 are not used. This seems awkward to me, but as a Python
novice Im not sure how else to do this.
Next, we compute dx and dy and set up x,y and X,Y in the same way that Lisa
did in her gravity example:

140

CHAPTER 5. FUN PROGRAMS


dx = (x2 - x1)/nx
dy = (y2 - y1)/ny
x = numpy.arange(x1 + dx/2., x2, dx)
y = numpy.arange(y1 + dy/2., y2, dy)
X,Y = pylab.meshgrid(x,y)

Note that the last value of x will be x2 - dx/2.


Next, we need to assign the values in the dat array to a 2-D array aligned with
X,Y. Following Lisas example, I first assign z to dummy values in a way that ensures
that it has the correct dimensions:
z = X + Y
Next, we explicitly copy the values from dat to the z array:
k = 0
for iy in range(0, ny):
for ix in range(0, nx):
k = k + 1
z[ix, iy] = dat[k]

#only goes to ny-1!


#only goes to nx-1!

The plot will look nicer if we take the log before plotting:
zlog = numpy.log10(z)
Finally we make the plot, again following Lisas example:
pylab.imshow(zlog,interpolation=bilinear,\
extent=(x1,x2,y1,y2), \
cmap=pylab.cm.Spectral) # cm is color map
pylab.colorbar()
pylab.axis(equal)
pylab.show()

(nice choices: Accent, Spectral, Reds)

Note that we have added the extent option to properly define the axes labels.
Its fun to explore the effect of different color maps in Python. Here is a site that
shows lots of them: http://www.scipy.org/Cookbook/Matplotlib/Show colormaps

5.2.3

Good targets and more about fractals

The entire Mandelbrot set may be imaged in the window


real -2 to 0.5, imag -1.25 to 1.25

(i.e., -2, 0.5, -1.25, 1.25)

5.2. FRACTALS

141

It is also fun to zoom in on details of the image. Here are some suggestions for
places to look:
-0.95 -0.90 0.25 0.30
-0.75 -0.65 0.25 0.35
-0.73 -0.72 0.282 0.292
-0.703 -0.693 0.26 0.27
Further details about generating fractals may be found in a series of articles by
A.K. Dewdney in the Computer Recreations section that used to appear in Scientific
American. The August 1985 article describes the Mandelbrot set, the November
1987 article talks about the related Julia sets, and the July 1989 article is about
fractals called biomorphs that look like little creatures. There are also lots of
books on fractals that have many beautiful images.
The study of fractal is related to chaos theory and the fact that in physical
systems (weather is one example) are inherently unpredictable on large time scales
because small perturbations to the starting conditions will cause large changes over
time.
ASSIGNMENT FRAC1
Rewrite the mandel.c and plotfractal.m (or plotfractal.py) programs to plot examples of the Julia sets and the biomorphs. You will want to first read the appropriate Scientific American articles.
1. Produce PDF files of 3 especially nice images of Julia sets.
2. Reproduce and make images of as many of the biomorphs as you can that are
shown on p. 110 of the biomorph article.
Part (2) is harder because you will have to generate more complicated complex
functions than those that are used for the Mandelbrot and Julia sets. You may have
to consult a book on complex numbers to find suitable formulae to compute sin(z),
z z , etc.
Please e-mail me the PDF files and copies of your F90 and Matlab or Python
source code.
Have fun!

142

CHAPTER 5. FUN PROGRAMS

Chapter 6

Lisas Python Notes


6.1

Introducing Python

You may be wondering why, after spending over a month learning Fortran, we
are now switching to Python. The answer is, once youve done all your fancy
calculations in Fortran, how were you planning on looking at your data? Now you
have experienced GMT, it is possible to do a lot of different plots with, say, psxy,
but getting your data into a suitable format will require some manipulation and
the plots are rather clunky. We at SIO often use Bob Parkers Plotxy program
which produces lovely X-Y plots. Other options are Excel, or Kaleidograph or
Matlab all of which are expensive (especially Matlab) and Excel plots are downright
ugly. But the modern standard is to be able to plot in 3D and to make plots
interactive. So, why Python? It is flexible, freely available and cross platform. It
is easy to learn and well documented. There are numerous online tutorials, for
example at: http://docs.python.org/tutorial/ or this new (and very excellent) one:
http://www.python-course.eu/index.php. There are lots of numerical, statistical
and visualization packages. Finally, it is easy to install. Really. Go to:
http://www.enthought.com/products/edudownload.php,
supply your academic e-mail address and follow the instructions. The only potentially tricky bit is setting your path properly. To do this, just put the following line
at the end of your .cshrc file:
set path = (/Library/Frameworks/Python.framework/Versions/Current/bin/ $path)
To make sure you have installed Python properly, simply type:
% python
You should get something that looks like this:
143

144

CHAPTER 6. LISAS PYTHON NOTES

Enthought Python Distribution -- www.enthought.com


Version: 7.1-2 (32-bit)
Python 2.7.2 |EPD 7.1-2 (32-bit)| (default, Jul 27 2011, 13:29:32)
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin
Type "packages", "demo" or "enthought" for more information.
>>>
If you dont get something about Enthought Python, you are probably using the
standard Mac Os version, which has none of the whistles and bells we want (plotting,
numerical packages, etc.), so start over with the Introduction. Otherwise, we are
ready to begin!

6.2

An Interactive Python session

As you have just fired up Python, you are in an interactive mode with the prompt
>>>. Like Matlab, you can just start typing commands. After each command,
press the return key and see what happens:
>>>
>>>
2
>>>
>>>
>>>
>>>
>>>
5
>>>
4
>>>
>>>

a=2
a
b=2
c=a+b
c
c+=1
c
a=2; b=2; c=a+b; c
d,e,f=4,5,6 # note syntax! d=4;e=5;f=6
# note also that the symbol # means the rest of the line is a comment!

To get out of Python interactive mode and back to your beloved command line,
type the control key (here-after ) and D at the same time. PC users may have to
type C instead.
From your Fortran programming experience, you will recognize a, b, and c in the
above session as variables and + as an operation. You may have been surprised
that we didnt declare variables up front (C programmers always have to). And, the
variables above are obviously behaving as integers, not floating point variables (no
decimals) but they are not letters between i and n. And there were funny lines that
seemed to set three numbers at once:

6.3. A FIRST LOOK AT NUMPY

145

>>> a=2; b=2; c=a+b; c


and
>>> d,e,f=4,5,6 # note syntax! d=4;e=5;f=6
Here are some rules governing variables and operations in Python:
Variable names are composed of alphanumeric characters, including - and .
They are case sensitive: a is not the same as A.
They do NOT have to be specified in advance (unlike C)
In fortran, integers are i n and floating points are all else - not the case in
Python. You can make them whatever you want.
+ adds, - subtracts, * multiplies, / divides, % gives the remainder, ** raises
to the power
These two are fun: += and -=. They increment and decrement respectively.
Parentheses determine order of operation (as in Fortran).
For math functions, we can use various modules that either come standard
with python (the math module) or are additions that come with the Enthought Python Edition we are using (the NumPy module). A module is just
a collection of functions. NB: There is online help for any python function or
method: just type help(FUNC).

6.3

A first look at NumPy

O.K. First of all - how do you pronounce NumPy. According to Important People
at Enthought (e.g, Robert Kern), it should be pronounced Num as in Number
and Pie as in, well, pie, or Python. Oops. It is way more fun to say Numpee!
So, that out of the way, what can NumPy do for us? Turns out, a whole heck of
a lot! But for now we will just scratch the surface. It can, for example, give us the
value of as numpy.pi. Note how the module name comes first, then the name of
the function we wish to use. In this case, the function just returns the value of .
To use NumPy functions, we must first import the module with the command
import. The first time you do this after installing Python, it may take a while, but
after that it should be very quick.

146

CHAPTER 6. LISAS PYTHON NOTES


There are four styles of the import command which all do pretty much the same

thing but differ in how you have to call the function after importing:

>>> import numpy


>>> numpy.pi
3.1415926535897931

This makes all the functions in NumPy available to you, but you have to call them
with the numpy.FUNC syntax.

>>> import numpy as np # or anything else!


>>> np.pi # or N.pi in the second case.
3.1415926535897931

e.g., some use:

This does the same as the first, but allows you to call NumPy anything you want
- to save typing? To import all the functions from NumPy and not have to type
numpy at all:
>>> from numpy import *
>>> pi
3.1415926535897931
This imports all the umpty-ump functions, which is a heavy load on your memory,
but you can also just import a few, like pi or sqrt:
>>> from numpy import pi, sqrt # pi and square root
>>> pi
3.1415926535897931
>>> sqrt(4)
2.0
Did you notice how sqrt(4) where 4 was an integer, returned a floating point variable
(2.0)?
Good housekeeping Tip #1: I tend to import modules using the first option
above. That way I know what module the functions Im using are coming from especially because we dont know off-hand ALL the functions available in any given
module and there might be conflicts with my own function names or two different
modules could have the same function (like math and numpy).
Here is a (partial) list of some useful NumPy functions:

6.4. VARIABLE TYPES

147

absolute(x) absolute value


arccos(x)
arccosine
arcsin(x)
arcsine
arctan(x)
arctangent
arctan2(y,x) arctangent of y/x in correct quadrant (***very useful!)
cos(x)
cosine
cosh(x)
hyperbolic cosine
exp(x)
exponential
log(x)
natural logarithm
log10(x)
base 10 log
sin(x)
sine
sinh(x)
hyperbolic sine
sqrt(x)
square root
tan(x)
tangent
tanh(x)
hyperbolic tangent
Note that in the trigonometric functions, the argument is in RADIANS!.You can
convert from degrees to radians by multiplying by: numpy.pi/180.. Also notice how
these functions have parentheses, as opposed to numpy.pi which has none. The
difference is that these take arguments, while numpy.pi just returns the value of .
If you are desperate, you can always use your Python interpreter as a calculator:
>>>
>>>
>>>
>>>
>>>
>>>
4
>>>
>>>
0.5

6.4

import numpy
a=2
b=-12
c=16
quad1 = (-b + numpy.sqrt(b**2-4.*a*c))/(2.*a)
quad1
y=numpy.sin(numpy.pi/6.)
y

Variable types

The time has come to talk about variable types. Weve been very relaxed up to
now, because we dont have to declare them up front and we can often even change
them from one type to another on the fly. But - variable types matter, so here goes.
Like Fortran, Python has integer, floating point (both long and short), string and
complex variable types. It is pretty clever about figuring out what is required. Here
are some examples:
>>> number=1 # an integer
>>> Number=1.0 # a floating point

148

CHAPTER 6. LISAS PYTHON NOTES

>>> NUMBER=1 # a string


>>> complex=1j # a complex number with imaginary part 1
>>> complex(3,1) # the complex number 3+1i
Try doing math with these!
>>> number+number
2
[ an integer]
>>> number+Number
2.0
[ a float]
>>> NUMBER+NUMBER
11 [a string]
>>> number+NUMBER [Gives you an angry error message]
Lesson learned: you cant add a number and a string. and string addition is different!
But you really have to be careful with this. If you multiply a float by an integer, it
is possible that you will convert the float to an integer when you really wanted all
those numbers after the decimal! So, if you want a float, use a float.
You can convert from one type to another (if appropriate) with:
int(Number); str(number); float(NUMBER);
long(Number); complex(real,imag)

long() converts to a double precision floating point and complex() converts the two
parts to a complex number.
There is another kind of variable called boolean. These are: true, false, and,
or, and not For the record, the integer 1 is true and 0 is false. These can be used
to control the flow of the program as we shall learn later.

6.5

Data Structures

In Fortran, you encountered arrays, which was a nice way to group a sequence of
numbers that belonged together. In Python of course we also have arrays, but we
also have more complicated data structures, like lists, tuples, and dictionaries, that
group arbitrary variables together, like strings and integers and floats - whatever
you want really. Well go through some attributes of the various data structures,
starting with lists.

6.5.1

Lists

Lists are denoted with [] and can contain any arbitrary set of items, including
other lists!

6.5. DATA STRUCTURES

149

Items in the list are referred to by an index number, starting with 0. Note
that this is different from Fortran which starts counting in arrays with the
number 1.
You can count from the end to the beginning by starting with -1 (the last item
in the list), -2 (second to last), etc.
Items can be sorted, deleted, inserted, sliced, counted, concatenated, replaced,
added on, etc.
Examples:
>>> mylist=[a,2.0,400,spam,42,[24,2]] # defines a list
>>> mylist[2] # refers to the third item
400
>>> mylist[-1] # refers to the last item
[24,2]
>>> mylist[1]=26.3
# replaces the second item
>>> del mylist[3] # deletes the fourth element
To slice out a chunk of the middle of a list:
>>> newlist=mylist[1:3]
This takes items 2 and 3 out (note it takes out up to but not including the last item
number - dont ask me why). Or, we can slice it this way:
>>> newlist=mylist[3:]
which takes from the fourth item (starting from 0!) to the end.
To copy a list BEWARE! You can make a copy - but it isnt a copy like in
Fortran, but it is just another name for the SAME OBJECT, so:
>>> mycopy=mylist
>>>mylist[2]=new
>>>mycopy[2]
new
See how mycopy got changed when we changed mylist? To spawn a new list that is
a copy, but an independent entity:
>>>mycopy=mylist[:]
Now try:
>>>mylist[2]=1003
>>>mycopy[2]
new

150

CHAPTER 6. LISAS PYTHON NOTES

So now mycopy stayed the way it was, even as mylist changed.


Unlike Fortran, Python is object oriented, a popular concept in coding circles.
Well learn more about what that means later, but for right now you can walk around
feeling smug that you are learning an object oriented programming language. O.K.,
what is an object? Well, mylist is an object. Cool. What do objects have that
might be handy? Objects have methods which allow you to do things to them.
Methods have the form: object.method()
Here are two examples:
>>> mylist.append(me too) # appends a string to mylist
>>> mylist.sort() # sorts the list alphanumerically
>>> mylist
[2.0, 42, [24, 2], 400, a, me too, spam]
For a complete list of methods for lists, see: http://docs.python.org/tutorial/datastructures.html#moreon-lists

6.5.2

More about strings

Numbers are numbers. While there are more kinds of numbers (complex, etc.),
strings can be more interesting. Unlike in Fortran, they can be denoted with single,
double or triple quotes: e.g., spam, Sams spam, or
"""
Hi there I can type as
many lines as I want
"""
Strings can be added together (newstring = spam + alot). They can be sliced
(newerstring = newstring[0:3]). but they CANNOT be changed in place - you cant
do this: newstring[0]=b. To find more of the things you can and cannot do to
strings, see: http://docs.python.org/tutorial/introduction.html#strings

6.5.3

Data structures as objects

6.5.4

Tuples

What? Tuples? Tuples consist of a number of values separated by commas. They


are denoted with parentheses.
>>> t = 1234, 2.0, hello
>>> t
(1234, 2.0, hello)
>>> t[0]
1234

6.5. DATA STRUCTURES

151

Tuples are sort of like lists, but like strings, their elements cannot be changed.
However, you can slice, concatenate, etc. For more see:
http://docs.python.org/tutorial/datastructures.html#tuples-and-sequences

6.5.5

Dictionaries!

Dictionaries are denoted by {}. They are also somewhat like lists, but instead of
integer indices, they use alphanumeric keys: I love dictionaries. So here is a bit
more about them.
To define one:
>>> Telnos={lisa:46084,lab:46531,jeff:44707} # defines a dictionary
To return the value associated with a specific key:
>>> Telnos[lisa]
46084
To change a key value:
>>> Telnos[lisa]=46048
To add a new key value:
>>> Telnos[newguy]=48888
Dictionaries also have some methods. One useful one is:
>>> Telnos.keys()
[lisa,lab,jeff,newguy]
which returns a list of all the keys.
For a more complete accounting of dictionaries, see:
http://docs.python.org/tutorial/datastructures.html#dictionaries

6.5.6

N-dimensional arrays

Arrays in Python are more similar to arrays in Fortran than they are to lists. Unlike
lists, arrays have to be all of the same data type (dtype), usually numbers (integers
or floats), although there appears to be something called a character array. Also,
the size and shape of an array must be known a priori and not determined on the fly
like lists. For example we can define a list with L=[], then append to it as desired,
but not so arrays - they are much pickier and well see how to set them up later.
Why use arrays when you can use lists? They are far more efficient than lists
particularly for things like matrix math. But just to make things a little confusing,

152

CHAPTER 6. LISAS PYTHON NOTES

there are several different data objects that are loosely called arrays, e.g., arrays,
character arrays and matrices. These are all subclasses of ndarray. Im just going
to briefly introduce arrays and matrices here.
Here are a few ways of making arrays:
>>> import numpy
>>> A= numpy.array([[1,2,3],[4,2,0],[1,1,2]])
>>> A
array([[1, 2, 3],
[4, 2, 0],
[1, 1, 2]])
>>> B=numpy.arange(0,10,1).reshape(2,5)
>>> B
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
>>> C=numpy.array([[1,2,3],[4,5,6]],numpy.int32)
>>> C
array([[1, 2, 3],
[4, 5, 6]])
>>> D=numpy.zeros((2,3)) # Notice the zeros and the size is specified by a tuple.
>>> D
array([[ 0., 0., 0.],
[ 0., 0., 0.]])
>>> E=numpy.ones((2,4))
>>> E
array([[ 1., 1., 1., 1.],
[ 1., 1., 1., 1.]])
>>> F=numpy.linspace(0,10,14)
>>> F
array([ 0.
,
0.76923077,
1.53846154,
2.30769231,
3.07692308,
3.84615385,
4.61538462,
5.38461538,
6.15384615,
6.92307692,
7.69230769,
8.46153846,
9.23076923, 10.
])
>>> G=numpy.ndarray(shape=(2,2), dtype=float)
>>> G
# note how this is initalized with really low numbers (but not zeros).
array([[ 1.90979621e-313,
2.75303490e-308],
[ 1.08539798e-071,
3.05363949e-309]])
Note the difference between linspace(start,stop,N) and arange(start,stop,step).
The function linspace creates an array with 14 evenly spaced elements between the
start and stop values while arange creates an array with elements at colorbluestep
intervals between the starting and stopping values. In some of the online examples
you will find the short-cuts for arange() and linspace as r (-5,5,20j) and r (-5,5,1.)
respectively.
Python arrays have methods like dtype, ndim, shape, size, reshape(), ravel(),
transpose() etc. Did you notice how some of these require parentheses and some

6.5. DATA STRUCTURES

153

dont? The answer is that some of these are functions and some are classes, both of
which we will get to later.
Lets see what the methods can do. First, arrays made in the above example are
of different data types. To find out what data type an array is, just use the method
dtype as in:
>>> D.dtype
dtype(float64)
>>>
And of course arrays, unlike lists have dimensions and shape. Dimensions tell
us how many axes there are with axes defined as in this illustration:

a)

b)

is

A array:
axis = 1

ax
axis = 0

axis = 0

[[1, 2, 3],
[4, 2, 0],
[1, 1, 2]]

axis = 1
As shown above our A array has two dimensions (axis 0 and 1). To get Python
to tell us this, we use the ndim method:
>>> A= numpy.array([[1,2,3],[4,2,0],[1,1,2]]) # just to remind you
>>> A.ndim
2
Notice how zeros, ones and ndarray used a shape tuple in order to define the
arrays in the examples above. The shape of an array is how many elements are along
each axis. So, naturally we see that the C array is a 2x3 array. Python returns a
tuple with the shape information using the shape method:
>>> C.shape
(2, 3)
Lets say we dont want a 2x3 array for the sequence in the array C, but we want
a 3x2 array. Python can reshape an array with a different shape tuple like this:
>>> C.reshape((3,2))
array([[1, 2],
[3, 4],
[5, 6]])

154

CHAPTER 6. LISAS PYTHON NOTES


And sometimes we just want all the elements lined up along one axis. We could

do that with reshape of course using a tuple with the size of the array (the total
number of elements). You can see that this is 6 here. We could even get python to
tell us what the size is (C.size) and use that in the reshape size tuple. Alternatively
we can use the ravel() method which doesnt require us to know the size in advance:
>>> C.ravel()
array([1, 2, 3, 4, 5, 6])
There are other ways to reshape, slice and dice arrays. The syntax for slicing of
arrays is similar to that for lists:
>>> B=A[0:2] # carve the top two lines off of matrix A from above
array([[1, 2, 3],
[4, 5, 6]])
Lots of applications in Earth Science require the transpose of an array:
>>> A.transpose() # this is the same as A.T
array([[1, 4, 7],
[2, 5, 8],
[3, 6, 9]])
Also, we can concatenate two arrays together with the - you guessed it - concatenate() method. For a lot more tricks with arrays, go to the NumPy Reference
website here: http://docs.scipy.org/doc/numpy/reference/.
We promised to tell you about matrix objects, so here goes. A matrix is another
subclass of ndarray with special advantages for particular applications. While arrays
are intended to be general purpose n-dimensional arrays for numerical computing,
matrices are better for linear algebra type problems: you can take the inverse and
find hermition easier (.I and .H methods) and matrix multiplication works like it
does in Matlab for matrix objects, whereas arrays do element-wise computations
(well learn more about this later). Check out this website for more differences
between the two: http://www.scipy.org/NumPy for Matlab Users.
To convert the A array to a list: L=A.tolist(), from a list or tuple to an array: A=numpy.array(L), or from a list, a tuple or an array to a NumPy array:
a=numpy.asarray(L))

6.6

Python Scripts

Are you tired of typing yet? Like UNIX shell scripts, Python scripts are programs
that can be run and re-run from the command line. You can type in the same

6.6. PYTHON SCRIPTS

155

stuff youve been doing interactively into a script file (ending in .py). You can edit
scripts with: vi, TextWrangler, Xcode, emacs, etc. NOT Word! And then you can
run them like this:
%python < myscript.py
Or you can put in a header line identifying the script as python (#!/usr/bin/env
python), make it executable (chmod a+x), and run it like this:
% myscript.py
Here is a familiar example that creates a script using the UNIX cat command,
makes it executable and then runs it:
% cat > printmess.py
#!/usr/bin/env python
# simple Python test program (printmess.py)
print test message
^-D
% chmod a+x printmess.py
% ./printmess.py
test message
In a Python script, the first line MUST be:
#! /usr/bin/env python
so that the file is interpreted as Python. Unlike Fortran or C, you CANNOT start
with a comment line (try switching lines 1 and 2 and see what happens).
The second line is a comment line. Anything to the right of # is assumed to be
a comment (Remember that in Fortran ! serves the same function).
Notice that print goes by default to your screen. Because the message is a string,
you can use single or double quotes for the test message. You can get an apostrophe
in your output by using double quotes and quote marks by using single quotes, i.e.,
#! /usr/bin/env
# simple Python
print "The pump
print She said

python
test program 2 (printmess2.py)
dont work cuz the vandals took the handles"
"I know what it\s like to be dead"

produces:
% ./printmess2.py
The pump dont work cuz the vandals took the handles
She said "I know what its like to be dead"
%

156

CHAPTER 6. LISAS PYTHON NOTES

In the second print statement, the \ is necessary to prevent an error (try it). This
is an example of a Python escape code. These are used to escape some special
meaning, as in an end-quote for a string in this example. We use the backslash to
say that we really really want a quote mark here. Other escape codes are listed
here:
http://www.python-course.eu/variables.php
Heres another example of a program - this one has an typo in line 4:
#! /usr/bin/env python
abeg = 2.1
aend = 3.9
adif = aend - abge
print adif = , adif
You had intended to type abeg but typed abge instead. When you run the
program, you get an error message:
Traceback (most recent call last):
File "./undeclared.py", line 4, in <module>
adif = aend - abge
NameError: name abge is not defined
Error messages are a desirable feature of Python. You dont want the program to
run by assigning some arbitrary value to abge and giving you a wrong answer. Yet
many languages will do exactly that, including Fortran (we can avoid this potential
problem in Fortran by using the implicit none statement at the beginning our our
programs).
ASSIGNMENT P1
Write a Python script to print your favorite pithy phrase. e-mail your script to
ltauxe@ucsd.edu

6.7

A first look at code blocks

Any reasonable programming language must provide a way to group blocks of code
together, to be executed under certain conditions. In Fortran, for example, there
are if statements and do loops which are bounded by the statements if, endif and
do, endo respectively. Many of these programs encourage the use of indentation
to make the code more readable, but do not require it. In Python, indentation is
the way that code blocks are defined - there are no terminating statements. Also,
the initiating statement terminates in a colon. The trick is that all code indented

6.7. A FIRST LOOK AT CODE BLOCKS

157

the same number of spaces (or tabs) to the right belong together. The code block
terminates when the next line is less indented. A typical Python program looks like
this:
program statement
block 1 top statement:
block 1 statement
block 1 statement \
ha-ha i can break the indentation convention!
block 1 statement
block 2 top statement:
block 2 statement
block 2 statement
block 3 top statement:
block 3 statement
block 3 statement
block 4 top statement: block 4 single line of code
block 2 statement
block 2 statement
block 1 statement
block 1 statement
program statement
Exceptions to the code indentation rules are:
Any statement can be continued on the next line with the continuation character \ and the indentation of the following line is arbitrary.
If a code block consists of a single statement, then that may be placed on the
same line as the colon.
The command break breaks you out of the code block. Use with caution!
There is a cheat that comes in handy when you are writing a complicated
program and want to put in the code blocks but dont want them to DO
anything yet: the command pass does nothing and can be used to stand in for
a code block.
Good housekeeping Tip #2: Always use only spaces or only tabs in your code
indentation. I use only spaces because I use vi to write my code. Others use Xcode,
the Python IDLE program, or TextWrangler to write their code and some of these
things use tabs by default. Whatever you do BE CONSISTENT because tabs are
not the same as spaces in Python even if you cant tell the difference just by looking
at it.

158

CHAPTER 6. LISAS PYTHON NOTES


In the following, Ill show you how Python uses code blocks to create do and

while loops, and if statements.

6.7.1

The for loop

Here is an example of a for loop that is similar to the way you would do it in
Fortran. :
#!/usr/bin/env python
mylist=[42,spam,ocelot]
for i in range(0,len(mylist),1): # note absence of Indices list, start and step
print mylist[i]
print All done
This script creates the list mylist with the line mylist=[42,spam,ocelot]. The
length of mylist is an integer value returned by len(mylist). The script uses this
integer as the stop value in the range() function, which returns a list of integers
from 0 to the stop value MINUS ONE at intervals of one. [ The minus one convention is hard to get use to for Fortran programmers, but it is typical of Python
syntax (and also of C) so just deal with it.] Anyway, range(start,stop,step) is just
like numpy.arange(start,stop,step) but returns integers instead of floats. Also, like
numpy.arange(), there is a short hand form when the minimum is zero and the
interval is one, so we could (and will) just use the command range(stop).
Python makes i step through the list of numbers from 0 to 2, printing the ith
element of mylist. Note how the print command is indented - this is the program
block that is executed for each i. Note also that the line could have been on the
previous line after the colon, because there is only one line in the program block.
But never-mind, this way works too and is more Fortran like. When i finishes its
business, the program block terminates. At that point, the program prints out the
All done string. There is no enddo statement or equivalent in Python.
But, Python is far more fun than the Fortran-like for i in syntax in the above
code snippet. In Python we can just step through a list directly. Here is another
script which does just that (why not?):
#!/usr/bin/env python
mylist=[42,spam,ocelot]
for item in mylist: # note absence of range statement
print item
print All done

6.7. A FIRST LOOK AT CODE BLOCKS

159

Note that of course we could have used any variable name instead of item, but it
makes sense to use variable names that mean what they do. It is easier to understand
what item stands for than just the Fortran style of i.
Here is an example with a little more heft to it. It creates a table of trigonometry
functions, spitting them out with a formatted print statement:
#! /usr/bin/env python
import numpy as np
deg2rad = np.pi/180. # remember conversion to radians
for theta in range(90): # short form of range, returns [0,1,2...89]
ctheta = np.cos(theta*deg2rad) # define ctheta as cosine of theta
stheta = np.sin(theta*deg2rad)# define stheta as sine of theta
ttheta = np.tan(theta*deg2rad)
# define ttheta as tangent of theta
print %5.1f %8.4f %8.4f %8.4f %(theta, ctheta, stheta, ttheta)

Lets pick this one apart a bit. First, notice the use of the variable deg2rad to
convert from degrees to radians. Also notice how deg2rad is defined: deg2rad =
np.pi/180. using the NumPy function for and the decimal point after 180. While
in this case, it makes absolutely no difference (try it!), it is a good practice to use
real numbers if you want your variable to stay real. In fact:
Good housekeeping Tip #3: Always use a decimal if you want your variable to
be a floating point variable.
The expression ctheta = np.cos(theta*deg2rad) uses the numpy cosine function.
Ideally theta should be a real variable while in fact it is an integer in this expression,
but fortunately Python figures that out and converts it to a real. Note that we could
have also converted theta to a float first with the command float(theta).
print %5.1f %8.4f %8.4f %8.4f %(theta, ctheta, stheta, ttheta)
To make the output look nice, we do not use
print theta, ctheta, stheta, ttheta
which would space the numbers irregularly among the columns and put out really
long numbers. Instead, we explicitly specify the output format. The output format
is given in the quotes. The format for each number follows the %, 5.1f is for 5
spaces of floating point output, with 1 space to the right of the decimal point (in
Fortran this is f5.1). The single blank space between %5.1f and %8.4f is included in
the output, in fact any text there is reproduced exactly in the output, thus to put
commas between the output numbers, write:

160

CHAPTER 6. LISAS PYTHON NOTES

print %5.1f, %8.4f, %8.4f, %8.4f %(theta, ctheta, stheta, ttheta)


Tabs (\t) would be formatted like this:
print %5.1f \t %8.4f\t

6.7.2

%8.4f,\t %8.4f %(theta, ctheta, stheta, ttheta)

If and while blocks

The for loop is just one way of controlling flow in Python. There are also if and
while code blocks. These execute code blocks the same way as for loops (colon
terminated top statements, indented text, etc.). For both of these, the code block is
executed if the top statement is TRUE. For the if block, the code is executed once
but in a while block, the code keeps executing as long as the statement remains
TRUE.
The key to flow control therefore is in the top statement of each code block; if
it is TRUE, then execute, otherwise skip it. To decide if something is TRUE or not
(in the boolean sense), we need to evaluate a statement using comparisons. You
know all about comparisons from Fortran. Python of course also has comparisons
and they work in similar ways with a few differences. Heres a handy table with
comparisons (relational operators) in different languages:
Comparisons
F 77 F90
C
MATLAB PYTHON meaning
.eq.
==
==
==
==
equals
.ne.
/=
!=
=
!=
does not equal
.lt.
<
<
<
<
less than
.le.
<=
<=
<=
<=
less than or equal to
.gt.
>
>
>
>
greater than
.ge.
>= >=
>=
>=
greater than or equal to
.and. .and.
&
&
and
.or.
.or.

or
These operators can be combined to make complex tests. Here is a juicy complicated statement:
if ( (a > b and c <= 0) or d == 0):
code block

There are rules for the order of operations for these things like, multiplication gets
done before addition. But these are easy to forget. You can look it up in the

6.7. A FIRST LOOK AT CODE BLOCKS

161

documentation if you are unsure or, better, just put in enough parenthesis to make
it completely clear to anyone reading your code.
Good housekeeping Tip #4: Use parentheses liberally - make the order of operation
completely unambiguous even if you could get away with fewer.
One nice aspect of Python compared to C is that if you make a mistake and
type, for example,
if (a = 0):

you will get an error message during compilation. In C this is a valid statement
with a completely different meaning than is intended!
Finer points of if blocks
The simplest if block works just like we have described:
if (2+2)==4: # note the use of == and parentheses in comparison statement
print I can put two and two together!
However, as in Fortran (and C and any other reasonable programming language),
there are whistles and bells to the if code blocks. In Python these are: elif and
else. As in the Fortran equivalent else if, the elif code block gets executed if the top
if statement is FALSE and the elif statement is TRUE. If both the top if and the elif
statements are FALSE but the else statement is TRUE, then Python will execute
the block following the else. Consider these examples:
#!/usr/bin/env python
mylist=[jane,doug,denise]
if susie in mylist:
pass # dont do anything
if susie not in mylist:
print call susie and apologize!
mylist.append(susie)
elif george in mylist: # if first statement is false, try this one
print susie and george both in list
else: # if both statements are false, do this:
print "susie in list but george isnt"

While loops
As already mentioned, the while block continues executing as long as the while
top statement is TRUE. In other words, the if block is only executed once, while

162

CHAPTER 6. LISAS PYTHON NOTES

the while block keeps looping until the statement turns FALSE. Here are a few
examples:
#!/usr/bin/env python
a=1
while a < 10:
print a
a+=1
print "Im done counting!"

6.7.3

Code blocks in interactive Python

All of these program blocks can also be done in an interactive session also using
indentation. The interactive shell responds with ..... instead of >>> once you
type a statement it recognizes as a top statement. To signal that you are done with
the program block, simply hit return:
>>> a=1
>>> while a<10:
....
print a
....
a+=1
....[return to execute block]
ASSIGNMENT P2
Rewrite your Fortran Assignment F4 in Python. E-mail it to ltauxe@ucsd.edu

6.8

File I/O in Python

Python would be no better than a rather awkward graphing calculator (and we


havent even gotten to the graphing part yet) if we couldnt read data in and spit
data out. You learned a rudimentary way of spitting stuff out already using the
print statement, but there is a lot more to file I/O in Python. We would like to be
able to read in a variety of file formats (not just X,Y in the first two columns as in
GMT) and output the data any way we want. In the following we will explore some
of the more useful I/O options in Python.

6.8.1

Reading data in

From a file
If you are using Python interactively or want interactivity in a script, use the command: raw input(). It acts as a prompt and reads in whatever is supplied prior to
a return as a string.

6.8. FILE I/O IN PYTHON


X=[] # make a list to put the data in
ans=float(raw_input("Input numeric value for X:
X.append(ans) # append the value to X
print X[-1] # print the last item in the list

163
"))

In this example, the variable ans will be read in as a string variable, converted to a
float and appended to the list, X. raw input() is a simple but rather annoying way
to enter things into a program. Another (less annoying) way is put the data in a
file (e.g., myfile.txt) with cat, paste, Excel (saved as a text file), or whatever and
read it into Python. The approach to this is similar to Fortran: we must first open
the file, then read it in and parse lines into the desired variables.
To open a file we use the command open(), one of Pythons built-in functions.
For a complete list of these, see:
http://docs.python.org/library/functions.html
The open() function returns an object, complete with methods, like readlines()
which, yes, reads all the lines. Here is a script (ReadStations.py that will open the
file station.list from the chapter on GMT, read in the data and print it out line by
line.
#!/usr/bin/env python
f=open(station.list)
StationNFO=f.readlines()
for line in StationNFO:
print line
If you run this script, you will get this behavor:
% ReadStations.py
9.02920
38.76560 2442 AAE
42.63900

74.49400 1645 AAK

37.93040

58.11890

51.88370 -176.68440
etc.

678 ABKT
116 ADK

The function open() has some bells and whistles to it and has the form open(name[,
mode[, buffering]) where the stuff in square brackets is optional. The name argument is the file name to open and mode is the way in which it should be opened,
most commonly for reading r, writing w or appending a. I use the form rU for
unformatted reading because I often want to read in files that were saved in Dos,
Mac OR Unix line endings and rU figures all that out for you. Just in case you are

164

CHAPTER 6. LISAS PYTHON NOTES

curious, Unix lines end in \n, Mac files in \r and Dos (and windows) lines end in
\r\n. I never use the buffering argument and dont know what it does.
If you are curious about the line endings, try typing out the representation
of the line repr(line) in the above script and you get all the stuff that is normally
invisible like the apostrophes and the line terminations:
% ReadStations.py

9.02920
38.76560 2442 AAE \n
42.63900
74.49400 1645 AAK \n
37.93040
58.11890 678 ABKT\n
51.88370 -176.68440 116 ADK \n
-13.90930 -171.77730 706 AFI \n
etc.
Notice how in our first version, printing the line also printed the line feed (\n) as
an extra line. To clean this off of each line, we can use the string strip() function:
print line.strip(\n)
Putting this into the code results in this behavior:
% ReadStations.py
9.02920
38.76560 2442 AAE
42.63900
74.49400 1645 AAK
37.93040
58.11890 678 ABKT
Lets say you want to read in the data table into lists called Lats, Lons, and
StaIDs (the first three columns). You need to split each line into its columns and
append the correct column into the appropriate list. Fortran automatically splits
on the spaces so you probably didnt have to worry about this sort of thing yet, but
Python reads in the entire line as a string and ignores the spaces or other possible
delimiters (commas, semi-colons, tabs, etc.). To split the line, we use the string
function split([sep]) where [sep] is an optional separator. If no separator is specified
(e.g., line.split()), it will split on spaces. Anything could be a separator, but the
most common ones are ,, ;, and \t. The latter is how a tab appears if you were
to, say, print out the representation of the line, which shows all the invisibles.
Here is a slightly modified version of ReadStations.py, ParseStations.py which
parses out the lines and puts numbers (floats or integers) in the right lists:
#!/usr/bin/env python
Lats,Lons,StaIDs,StaName=[],[],[] ,[]# creates lists to put things in
StationNFO=open(station.list).readlines() # combines the open and readlines methods!

6.8. FILE I/O IN PYTHON

165

for line in StationNFO:


nfo=line.strip(\n).split() # strips off the line ending and splits on spaces
Lats.append(float(nfo[0])) # puts float of 1st column into Lats
Lons.append(float(nfo[1]))# puts float of 2nd column into Lons
StaIDs.append(int(nfo[2])) # puts integer of 3rd column into StaIDs
StaName.append(nfo[3])# puts the ID string into StaName
print Lats[-1],Lons[-1],StaIDs[-1] # prints out last thing appended

From standard input


As in Fortran, Python can also read from standard input. To do this, we need a
system specific module, called sys which among other things has a stdin method.
So, instead of specifying a file name in the open command, we could substitute the
following line:
#!/usr/bin/env python
import sys
Lats,Lons,StaIDs,StaName=[],[],[] ,[]# creates lists to put things in
StationNFO=sys.stdin.readlines() # reads from standard input
for line in StationNFO:
nfo=line.strip(\n).split() # strips off the line ending and splits on spaces
Lats.append(float(nfo[0])) # puts float of 1st column into Lats
Lons.append(float(nfo[1]))# puts float of 2nd column into Lons
StaIDs.append(int(nfo[2])) # puts integer of 3rd column into StaIDs
StaName.append(nfo[3])# puts the ID string into StaName
print Lats[-1],Lons[-1],StaIDs[-1] # prints out last thing appended
The program can be invoked with:
% ReadStations.py < station.list
We could also use command line switches by reading in arguments from the
command line. In the following example, we use the switch -f with the following
argument begin the file name:

6.8.2

Command line switches

#!/usr/bin/env python
import sys
Lats,Lons,StaIDs,StaName=[],[],[] ,[]# creates lists to put things in
if -f in sys.argv: # look in list of command line arguments
file=sys.argv[sys.argv.index(-f)+1] # find index of -f and increment by one
StationNFO=open(file,rU).readlines() # open file
for line in StationNFO:
nfo=line.strip(\n).split() # strips off the line ending and splits on spaces
Lats.append(float(nfo[0])) # puts float of 1st column into Lats

166

CHAPTER 6. LISAS PYTHON NOTES


Lons.append(float(nfo[1]))# puts float of 2nd column into Lons
StaIDs.append(int(nfo[2])) # puts integer of 3rd column into StaIDs
StaName.append(nfo[3])# puts the ID string into StaName
print Lats[-1],Lons[-1],StaIDs[-1] # prints out last thing appended

This version can be invoked with:


% ReadStations.py -f

station.list

Reading numeric files


In the special case where the data in a file are entirely numeric, you can read in the
file with a special numpy function loadtxt(). This reads the data into a list whereby
each element of the list is a list of numbers from each line.

6.8.3

Writing data out

Lets say I have a Python module that will convert latitudes and longitudes to UTM
coordinates. O.K. I really do have one that I downloaded from here:
http://code.google.com/p/pyproj/issues/attachmentText?id=27&aid=
-80884174771817564 &name=UTM.py&token=46ab62caa041c3f240ca0e55b7b25ad6
I wrote a script (ConvertStations.py) to convert each of the stations in my list
to their UTM equivalents (assuming these were in a WGS-84 ellipsoid). It would
be nice if after having done this to the data, I could then write it out somehow,
preferably to a file. Of course I could use the print command like this:
#!/usr/bin/env python
import UTM # imports the UTM module
Ellipsoid=23-1 # UTMs code for WGS-84
StationNFO=open(station.list).readlines()
for line in StationNFO:
nfo=line.strip(\n).split()
lat=float(nfo[0])
lon=float(nfo[1])
StaName= nfo[3]
Zone,Easting, Northing=UTM.LLtoUTM(Ellipsoid,lon,lat)
print StaName, : , Easting, Northing, Zone

which spits out something like this:


% ConvertStations.py
AAE : 474238.170087 998088.469113 37P

6.8. FILE I/O IN PYTHON

167

AAK : 458516.115522 4720850.45385 43T


ABKT : 598330.712671 4198681.92944 40S
ADK : 521722.179764 5748148.625 1U
AFI : 416023.683618 8462168.07766 2L
etc.
I could save the output with a UNIX re-direct:
ConvertStations.py > mynewfile

But we yearn for more. So, more elegantly, I can open an output file [for appending a or (over)writing w] write a formatted string using the write method on
the output file object with format string:
#!/usr/bin/env python
import UTM # imports the UTM module
outfile=open(mynewfile,w) # creates outfile object
Ellipsoid=23-1 # UTMs code for WGS-84
StationNFO=open(station.list).readlines()
for line in StationNFO:
nfo=line.strip(\n).split()
lat=float(nfo[0])
lon=float(nfo[1])
StaName= nfo[3]
Zone,Easting, Northing=UTM.LLtoUTM(Ellipsoid,lon,lat)
outfile.write(%s %s %s %s\n%(StaName, Easting, Northing, Zone))
The only significant changes are 1) the object outfile is opened for writing. Note
that this will clobber anything in a pre-existing file by that name and 2) the output
file gets written to in the statement with a write method on the output file object:
outfile.write(%s

%s

%s

%s\n%(StaName, Easting, Northing, Zone))

The write statement uses the syntax: format string%(list of variables tuple). Format strings have these rules:
For each variable in (what you...) you need a format: %s for string, %i for
integer, %f for float, %e for exponent
you can also specify further, e.g.: %7.1f for 7 characters with 1 after the
decimal %10.3e for 10 characters with 3 after the decimal
where the number of characters include the decimal and padded spaces
As noted before, the format string can include punctuation:

168

CHAPTER 6. LISAS PYTHON NOTES

x,y=4.82,2.3e3
print %7.1f,%s\t%10.3e%(x,hi there,y)
4.8,hi there 2.300e+03

In the ConvertStations2.py script, the \n string puts a UNIX line ending on


it. Without that, the whole file is but a single line (very annoying).
A session using the script (ConvertStations2.py and a peek at the resulting file could
look like this:
% ConvertStations2.py
% head mynewfile
AAE 474238.170087 998088.469113 37P
AAK 458516.115522 4720850.45385 43T
ABKT 598330.712671 4198681.92944 40S
ADK 521722.179764 5748148.625 1U
AFI 416023.683618 8462168.07766 2L
ALE 509467.666259 9161062.29194 20X
ALQ 366981.843985 3868044.56906 13S
ANMO 366981.843985 3868044.56906 13S
ANTO 482347.254856 4413225.7807 36S
AQU 368638.770654 4690300.1797 33T
Im assuming you know what the UNIX head command does or at least how to find
out!

6.9

Functions

So far you have learned how to use functions from program modules like NumPy.
You can imagine that there are many bits of code that you might write that you
will want to use again and again, say converting between degrees and radians and
back, or finding the great circle distance between two points on Earth, or converting
between UTM and latitude/longitude coordinates (as in UTM.py, my new favorite
package). In Fortran, you learned about subroutines and functions which do this
sort of work for you. In Python, we also have a way to do this of course. The basic
structure of a program with a Python function is:
#!/usr/bin/env python
def FUNCNAME(in_args):
"""
DOC STRING
"""
some code that the functions does something

6.9. FUNCTIONS

169

return out_args
FUNCNAME(in_args) # this calls the function

6.9.1

Line by line analysis

def FUNCNAME(in args):


The first line must have def as the first three letters, must have a function name
with parentheses and a terminal colon. If you want to pass some variables to the
function, they go where in arg sits, separated by commas. Unlike in Fortran, there
are no output variables here.
There are four different ways to handle argument passing.
1) You could have a function that doesnt need any arguments at all:
#!/usr/bin/env python
def gimmepi():
"""
returns pi
"""
return 3.141592653589793
print gimmepi()
2) You could use a Fortran like style, where there is a set list of what are called
formal variables that must be passed:
#!/usr/bin/env python
def deg2rad(degrees):
"""
converts degrees to radians
"""
return degrees*3.141592653589793/180.
print 42 degrees in radians is: ,deg2rad(42.)
3) You could have a more flexible need for variables. You signal this by putting
*args in the in args list (along with any formal variables you want):
#!/usr/bin/env python
def print_args(*args):
"""
prints argument list
"""
print You sent me these arguments:
for arg in args:
print arg
print_args(1,4,hi there)
print_args(42)

170

CHAPTER 6. LISAS PYTHON NOTES

4) You can use a keyworded, variable-length list by putting **kwargs in for in args:
#!/usr/bin/env python
def print_kwargs(**kwargs):
"""
prints keyworded argument list
"""
for key in kwargs:
print %s %s %(key, kwargs[key])
print_kwargs(arg1=one,arg2=42,arg3=ocelot)

Doc String
Although you can certainly write functional code without a document string, make
a habit of always including one. Trust me - youll be glad you did. This can later
be used to remind you of what you thought you were doing years later. It can be
used to print out a help message by the calling program and it also lets others
know what you intended. Notice the use of the triple quotes before and after the
documentation string - that means that you can write as many lines as you want.
Function body
This part of the code must be indented, just like in a for loop, or other block of
code.
Return statement
You dont need this unless you want to pass back information to the calling body
(see, for example print kwargs() above). But unlike in Fortran, where variables are
passed back the same way they get in, through the first line, Python separates the
entrance and the exit. See how it can be done in the gimme pi() example above.

6.9.2

Main program as function

It is considered good Python style to treat your main program block as a function
too. (This helps with using the document string as a help function and building
program documentation in general.) In any case, I recommend that you just start
doing it that way too. In this case, we have to call the main program with the final
(not indented) line main():

6.9. FUNCTIONS

171

#!/usr/bin/env python
def print_kwargs(**kwargs):
"""
prints keyworded argument list
"""
for key in kwargs:
print %s %s %(key, kwargs[key])
def main():
"""
calls function print_kwargs
"""
print_kwargs(arg1=one,arg2=42,arg3=ocelot)
main() # runs the main program
Notice how in the above examples, all the functions preceded the main function.
This is because Python is an interpreter and not compiled - so it wont know about
anything declared below as it goes through the script line by line. On the other hand,
weve been running lots of functions and they were not in the program we used to
call them. The trick here is that you can put a bunch of functions in a separate file
(in your path) and import it, just like we did with NumPy. Your functions can then
be called from within your program in the same way as for NumPy.
So lets say I put all the above functions in a file called myfuncs.py:
def gimmepi():
"""
returns pi
"""
return 3.141592653589793
def deg2rad(degrees):
"""
converts degrees to radians
"""
return degrees*3.141592653589793/180.
def print_args(*args):
"""
prints argument list
"""
print You sent me these arguments:
for arg in args:
print arg
I could then just import the module myfuncs from within another program, or just
interactively. I can use the functions, or just call for help:
% python

172

CHAPTER 6. LISAS PYTHON NOTES

>>> import myfuncs


>>> print myfuncs.gimmepi()
3.14159265359
>>> print myfuncs.print_args.__doc__
prints argument list
>>>

6.9.3

Scope of variables

As in Fortran, inside a function, variable names have their own meaning which in
many cases will be different from inside the calling function. So, variables names
declared inside a function stay in the function. This is true unless you declare them
to be global. Here is an example in which the main program knows about the
functions variable V:
def myfunc():
global V
V=123
def main():
myfunc()
print V
main()
In addition to being able to write your own functions, of course Python has
LOTS of modules and a gazzillion functions. The enthought distribution that you
are using Includes plotting, numerical recipes, trig functions, image manipulation,
animation, and many more. We will explore some of these in the rest of the class.
ASSIGNMENT P3:
Write a subroutine module that has these functions:
a function that returns the bulk parameters relating to the Earth from this
website:
http://nssdc.gsfc.nasa.gov/planetary/factsheet/earthfact.html
Also include the average radius of: 6,371 km
a function that converts degrees to radians
one that converts radians to degrees
one that converts longitude and latitude to cartesian coordinates. [hint, x =
cos(az) cos(pl), y = sin(az) cos(pl), z = sin(pl)]], assuming a radius of unity
one that converts cartesian coordinates back to longitude and latitude

6.10. COMBINING F90 CODE WITH PYTHON

173

and one that calculates the great circle distance between two points using the
numpy.dot() function to get the angular separation and the radius of the Earth
to get the arc length. Assume the average radius of the Earth (gotten from
the first function).
Then write a program that takes keyboard entry for an longitude and latitude pair
and prints out the X,Y,Z, converts back and prints the new longitude and latitude
out (as a check) and gives you the great circle distance in km. [HINT: Take a look at
the function SPH AZI in the Fortran chapter.] E-mail your code to ltauxe@ucsd.edu

6.10

Combining F90 code with Python

Now you have the very basics of Python programming under your belt, we can try
to pull together the different threads you have been learning by investigation how
to combine F90 with Python. You might ask why not just use Python? or Fortran?
The answer is that there is a lot of Fortran code lying around that you might want
to use, but no way to visualize the data, so you want Python for that. Or you are
solving an extremely computationally intensive problem (global climate model, or
convection in the Earths core or mantle, or ....) and you need code that is as fast as
possible. Although NumPy is compiled and therefore very fast, Fortran is still the
fastest thing around. For whatever reason, you are taking a class in both Fortran
and Python so it makes pedagogical sense to try to tie the two halves together.
The next question is How?. There are several different approaches to this
ranging from the simple to the sophisticated. The simple approach would be to
create output files with Fortran and the read it in and plot it or whatever with
Python. A more sophisticated approach would be to call subroutines and functions
from a Fortran module which is imported like any other module into Python. This
is more tricky and involves a Python package called f2py which came with your
Python distribution.
I have tested f2py using the gfortran and python packages that were recommended for this class and it worked fine. But I also had to get rid of my beloved
antique /usr/local/bin/g77 compiler, but you probably dont have one. This is just
a trouble shooting tip. For reference, the URLs for these are:
http://gcc.gnu.org/wiki/GFortranBinaries#MacOS
and
http://www.enthought.com/products/getepd.php

174

CHAPTER 6. LISAS PYTHON NOTES

Because these websites change quickly, I also put the .dmg files on the class website:
http://mahi.ucsd.edu/class233/gfortran-and-gcc-4-6-2-RC20111019-Sno-x86-64.dmb
and
http://mahi.ucsd.edu/class233/epd-7.1-2-macosx-i386.dmb
Assuming you have installed everything properly, there we will cover three ways
to use f2py in the following. These are:
Brute force: Try to get f2py to create a compiled module (ending in .so) from
a standard F90 program, which can then be imported like any other module.
Because Fortran subroutines have arguments that both go into the subroutine
and come out, and Python doesnt, f2py has to make guesses as to the use of
variables and does its best. This works in simple cases, but can fail badly at
times.
Signature File: Ask f2py to read through the code, picking out all the variables
and create what is called a signature file (ending in .pyf). This has f2pys
guesses as to what the variables are supposed to do. The signature file can
be edited to supply the correct intent of variables. Then f2py can create the
compiled module based on what is in the signature file, which functions with
fewer errors (from bad guesses).
F90 surgery: The Fortran code can be modified itself to help f2py in interpreting variables.

6.10.1

Brute force f2py method:

Lets start with an example using the program gcf2.f90 from the chapter on Fortran.
This has a function to calculate greatest common factors. In fact we dont need all
the fiddly bits at the top - just the function getgcf. Lets save it in a file called
gcf.f90:
function getgcf(x, y)
implicit none
integer :: getgcf, x, y, i, z
do i = 1, min(x,y)
if (mod(x, i) == 0 .and. mod(y, i) == 0) getgcf = i
end do
end function getgcf
To compile the function (or the whole program!), use this syntax:

6.10. COMBINING F90 CODE WITH PYTHON

175

f2py -c gcf.f90 -m gcf


The -c switch specifies which file to compile and the -m switch specifies the stem of
the output file. f2py will create something called that stem with .so appended to
it, so if all is well you will get a file called gcf.so. The output file gcf.so is a callable
module from within python:
>>> import gcf # note that no .so is needed
>>> gcf.getgcf(8,4) # call the function in the usual way
4
The brute force method is not without potential problems. For example, we
could use the function without checking for variable type and send the F90 function something with the wrong type, getting back the wrong answer with no error
message. Also, f2py assumes that all variables are going IN to the subroutine and
doesnt know that some are intended to come out. So we need a way to teach f2py
which variables are intended to go in and which are intended to come out. This can
be done either with a signature file if you cant modify the Fortran code itself, or
by inserting a few python hints into the Fortran code itself.
To illustrate the problem, lets consider another F90 subroutine sph azi from the
Fortran chapter:
subroutine SPH_AZI(flat1, flon1, flat2, flon2, del, azi)
implicit none
real :: flat1,flon1,flat2,flon2,del,azi,pi,raddeg,theta1,theta2,
phi1,phi2,stheta1,stheta2,ctheta1,ctheta2,
sang,cang,ang,caz,saz,az
if ( (flat1 == flat2 .and. flon1 == flon2) .or.
&
(flat1 == 90. .and. flat2 == 90.) .or.
&
(flat1 == -90. .and. flat2 == -90.) ) then
del=0.
azi=0.
return
end if
pi=3.141592654
raddeg=pi/180.
theta1=(90.-flat1)*raddeg
theta2=(90.-flat2)*raddeg
phi1=flon1*raddeg
phi2=flon2*raddeg
stheta1=sin(theta1)
stheta2=sin(theta2)
ctheta1=cos(theta1)
ctheta2=cos(theta2)

&
&

176

CHAPTER 6. LISAS PYTHON NOTES

cang=stheta1*stheta2*cos(phi2-phi1)+ctheta1*ctheta2
ang=acos(cang)
del=ang/raddeg
sang=sqrt(1.-cang*cang)
caz=(ctheta2-ctheta1*cang)/(sang*stheta1)
saz=-stheta2*sin(phi1-phi2)/sang
az=atan2(saz,caz)
azi=az/raddeg
if (azi.lt.0.) azi=azi+360.
end subroutine SPH_AZI
Lets try the brute force f2py way:
f2py -c sph_azi.f90 -m sph_azi
and try sph azi out in Python:
%python
>>> import sph_azi
>>> delta,azi=0.,0. # have to declare these..
>>> sph_azi.sph_azi(33,-117,41,-72,delta,azi) # note module/function name are same.
>>> print delta, azi
0.0 0.0
Well, that didnt work. The problem is that Python cant get the variables delta,
azi back out from the subroutine. The Fortran style of using the entrance as the exit
makes it tough to figure out which variables are supposed to go in and which are
supposed to come out, to f2py doesnt even try. The solution is to use the signature
file method.

6.10.2

Signature files

We can use f2py to create a signature file sph azi.pyf with the command:
f2py sph_azi.f90 -m sph_azi -h sph_azi.pyf
The syntax is a little different from the brute force method in that we are not
compiling sph azi yet, we are just creating the .pyf file (specified by the -h switch,
for use in the eventual sph azi module (specified by the -m switch).
If we look inside the ph azi.pyf file, we find:
!
-*- f90 -*! Note: the context of this file is case sensitive.
python module sph_azi ! in

6.10. COMBINING F90 CODE WITH PYTHON

177

interface ! in :sph_azi
subroutine sph_azi(flat1,flon1,flat2,flon2,del,azi) ! in :sph_azi:sph_azi.f90
real :: flat1
real :: flon1
real :: flat2
real :: flon2
real :: del
real :: azi
end subroutine sph_azi
end interface
end python module sph_azi
! This file was auto-generated with f2py (version:2).
! See http://cens.ioc.ee/projects/f2py2e/
At this point, all we have is a list of all the variables f2py found. Note that while
Fortran doesnt care about case, Python does, so the variable names within the .f90
code are converted to lower case by default. (You can suppress this with the nlower switch.) The default assumption is that all of the variables found are headed
IN to the subroutine and not intended to come out. Because in the case of sph azi
this is not true (del and azi are coming out), we must edit the .pyf file to explicitly
say which variables are coming in and which are coming out. This is done with the
by inserted the function intent(in) or intent(out) after the variable type:
python module sph_azi ! in
interface ! in :sph_azi
subroutine sph_azi(flat1,flon1,flat2,flon2,del,azi)
real intent(in) :: flat1
real intent(in):: flon1
real intent(in):: flat2
real intent(in) :: flon2
real intent(out) :: del
real intent(out) :: azi
end subroutine sph_azi
end interface
end python module sph_azi
If we save the modified .pyf file as sph azi1.pyf, we can compile sph azi with the
command:
% f2py -c sph_azi1.pyf sph_azi.f90
which creates a new file sph azi.so. Lets see if that one works better:
% python
>>> import sph_azi

178

CHAPTER 6. LISAS PYTHON NOTES

>>> delta,azi=sph_azi.sph_azi(33,-117,41,-72)
>>> print delta, azi
36.4012794495 64.0623321533
And it does!

6.10.3

F90 Surgery

In the last section we saw how to use F90 code, without touching it, just by clarifying
a few things for poor old f2py. IF you can modify the F90 source, you can skip the
signature file by inserting special commands, called f2py directives that the Fortran
compiler will ignore (because they start with a !), but f2py will recognize (sneaky,
huh?). Here is an example of a duly modified code, saved as sph azi2.f90:
subroutine SPH_AZI(flat1, flon1, flat2, flon2, del, azi)
implicit none
real :: flat1,flon1,flat2,flon2,del,azi,pi,raddeg,theta1,theta2, &
phi1,phi2,stheta1,stheta2,ctheta1,ctheta2,
&
sang,cang,ang,caz,saz,az
!f2py intent(in) flat1
!f2py intent(in) flon1
!f2py intent(in) flat2
!f2py intent(in) flon2
!f2py intent(out) del
!f2py intent(out) azi
if ( (flat1 == flat2 .and. flon1 == flon2) .or.
&
(flat1 == 90. .and. flat2 == 90.) .or.
&
(flat1 == -90. .and. flat2 == -90.) ) then
del=0.
azi=0.
return
end if
pi=3.141592654
etc.
We can then compile the modified code with the command:
f2py -c -m sph_azi2 sph_azi2.f90
This works just like the signature file example but using the module name
sph azi2 instead of sph azi.

6.10.4

A few more things you need to know:

NB: If you are modifying Fortran 77, then use Cf2py instead of f2py but this
isnt included in the Enthought Python you installed, and I have never used
it, so you are on your own here.

6.11. CLASSES

179

For variables that are both in and out, use: intent(inout)


For variables are are change in place, use: intent(inplace)
For allocatable variables, use, e.g., intent(out,allocatable)
Watch out with arrays Fortran and the default NumPy arrays are the transpose of one another! There are ways spelled out in the NumPy documentation
for making your Pytnon arrays behave the same as the Fortran ones - so go
read that before you get fancy ideas.
f2py converts all the Fortran subroutine names to lower case. You can suppress
this with a n-lower option.
For more detailed information on what f2py is really doing, see the documentation available here:
http://www.scipy.org/F2py
There is also a helpful, but somewhat dated reference manual available here:
http://cens.ioc.ee/projects/f2py2e/usersguide/
Note that in the latter, there are many references to a module named Numeric,
which is a predecessor of NumPy - so dont try to call it because you dont have it
installed.
ASSIGNMENT P4
Modify your ASSIGNMENT F8 such that that main program is in Python and
calls your F90 subroutine. Use the F90 surgery method above to modify your
subroutine such that it compiles without the use of signature files. E-mail both files
plus the script you would use to compile them to ltauxe@ucsd.edu

6.11

Classes

Before we go any further, we need to learn some basic concepts about classes. These
are the basis of object oriented programming OOP (that again!). Class objects lie
behind plotting, for example and a rudimentary understanding of what they are and
how they work will come in handy when we start doing anything but the simplest
plotting exercises.
A class object is created by a call to a class definition which which can be
thought of as a blueprint for the class object. Here is an simple example of a class
definition:

180

CHAPTER 6. LISAS PYTHON NOTES

class Circle:
"""
This is simple example of a class
"""
pi=3.141592653589793
def __init__(self,r):
self.r=r
def area(self):
return 0.5*self.pi*self.r**2
def circumference(self):
return 2.*pi*self.r

Saving this class in a file called Shapes.py we can use it in a Python session in a
manner similar to function modules:
>>> import Shapes # import the class
>>> r=4.0
>>> C=Shapes.Circle(r) # create a class instance with r=4.
>>> C.pi # retrieve an attribute
3.141592653589793
>>> C.area() # retrieve a method
25.132741228718345
>>> C.r=2.0 # change the value of r
>>> C.area() # get a new area
6.283185307179586
In spite of superficial similarities, classes are not the same as functions. Although
the Shape module is imported just the same as any other, to use it, we first have
to create a class instance (C=Shapes.Circle(r)). C is an object with attributes
(variables) and methods. All methods (parts that start with def), have an
argument list. The first argument has to be a reference to the class instance itself,
or self, followed by any variables you want to pass into the method. So the init
method initializes the instance attributes of an object. In the above case, it defined
the attribute r, which gets passed in when the class is first called. Asking for any
attribute (note the lack of parentheses), retrieves the current value of that attribute.
Attributes can be changed (as in C.r=2.0).
The other methods (area and circumference) are defined like any function except
note the use of self as the first argument. This is required in all class method
definitions. In our case, no other parameters are passed in because the only one
used is r, so the argument list consists of only self. Calling these methods returns
the current values of these methods.

6.12. MATPLOTLIB

181

You can make a subclass (child) of the parent class which has all the attributes
and methods of the parent, but may have a few attributes and methods of its own.
You do this by setting up another class definition within a class.
So, the bottom line about classes is that they are in the same category of things
as variables, lists, dictionaries, etc. That is, they are data structures - they hold
data, and the methods to process that data. If you are curious about classes, theres
lots more to know about classes that we dont have time to get into, but you can
find useful tutorials online:
(e.g., http://www.sthurlow.com/python/lesson08/)
ASSIGNMENT P5
Write a module called Shapes
Shapes should have classes for: circle, sphere, cylinder, rectangle, and cube.
these classes should have methods that return things volume, circumference,
area, mass (pass the argument density), where appropriate.
write a class called Earth that has attributes using the bulk parameters from
assignment P3
write a program that uses the Earth density, radius from the Earth class, passes
these to the sphere class and calculates the mass. How does this compare with
the mass given by the Earth class?

6.12

Matplotlib

So far you have learned the basics of Python, NumPy and how to link F90 code with
Python. But Python was sold as a way of visualizing data and we havent yet seen
a single plot! There are many plotting options within the Python umbrella. The
most mature and the one I am most familiar with is matplotlib, a popular graphics
module of Python. Actually matplotlib is a collection of a bunch of other modules,
toolkits, methods and classes. For a fairly complete and readable tour of matplotlib,
check out these links:
http://matplotlib.sourceforge.net/Matplotlib.pdf
and here:
http://matplotlib.sourceforge.net/

6.12.1

A first plot

Lets start with a simple plot script (matplotlib1.py):

182

CHAPTER 6. LISAS PYTHON NOTES

#!/usr/bin/env python
import matplotlib
matplotlib.use("TkAgg") # my favorite backend
import pylab # module with matplotlib
pylab.plot([1,2,3]) # plot some numbers
pylab.ylabel(Y) # label the y-axis
pylab.show() # reveal the plot
The first step should be obvious by now, it imports matplotlib. Figures are
rendered on backends so they appear on screen. There are a lot of different backends with slightly different looks. Some work better on different operating systems.
I use the very old school backend called TkAgg backend because it works. So
step 2 sets the backend: matplotlib.use(TkAgg). The module matplotlib itself
contains a lot of other modules. One of these, pylab is the business end that has
a lot of plotting methods and classes. It must be loaded alongside matplotlib, so
step 3 is: import pylab. After that the fun starts.
In the above example, we call the plot method with a list as an argument. As
I mentioned, matplotlib uses the concept of classes to make plots and this has
just happened behind the scenes. We could have named the plot instance with
a the figure() method (e.g., fig=pylab.figure()) and then referred to it later with
the command fig.plot([1,2,3]), but we dont have to in this simple case - the class
instance is implied and is the current plot. You can tell this, if you do the above
example in interactive mode:
>>> import matplotlib
>>> matplotlib.use("TkAgg")
>>> import pylab
>>> pylab.plot([1,2,3])
[<matplotlib.lines.Line2D object at 0x4bd6eb0>]
The bit about [<matplotlib.lines.Line2D object at 0x4bd6eb0>] is Pythons way of
telling you that you just created an object and something about it. In any case,
when you give plot() a single sequence of values (as above), it assumes they are y
values and supplies the x values for you.
Attributes of the pylab class, such as the Y axis label can be changes with the
ylabel method. As you can imagine, there are LOTS of methods, including, surprise,
an xlabel method.
When we are done customizing the plot instance, we can view it with the show
method. When that gets executed, we will get a plot something like this:

6.12. MATPLOTLIB

183

Use the red button to close the plot window

Use these buttons to save and modify your plot

Once that happens, we wont be able to change the plot any more and in fact, we
wont get our terminal back until the little plot window is closed. You can save your
plot with the little disk icon in a variety of formats. Adobe Illustrator likes .svg, or
.eps while Microsoft products like .png file formats.
If you find it annoying to always have to close figures with the little red button,
or save them with the disk icon, you can tweak the program like this:
#!/usr/bin/env python
import matplotlib
matplotlib.use("TkAgg")
import pylab
pylab.ion() # turn on interactivity
pylab.plot([1,2,3])
pylab.ylabel(Y)
pylab.draw() # draw the current plot
ans=raw_input(press [s] to save figure, any other key to quit: )
if ans==s:
pylab.savefig(myfig.eps)
The method pylab.savefig(FILENAME.FMT). The .FMT can be one of several, e.g.,
.eps, .svg, .ps, .pdf, .png, .gif, .jpg, etc.). Some of these (the vector graphics ones

184

CHAPTER 6. LISAS PYTHON NOTES

like pdf, ps, eps and svg) can be opened in Adobe Illustrator for modification.
As mentioned earlier, if you give plot() a single sequence of values, it assumes
they are y values and supplies the x values for you. Garbage in, garbage out. But
plot() takes an arbitrary number of arguments of the form: (X1 , Y1 , line style 1,
X2 , Y2 , line style 2, etc.), where line style is a string that specifies the line style as
illustrated in this script called matplotlib2.py
#!/usr/bin/env python
import matplotlib
matplotlib.use("TkAgg")
import pylab,numpy
x=numpy.arange(0,360,10)
r=x*numpy.pi/180.
c=numpy.cos(r)
s=numpy.sin(r)
pylab.plot(x,c,r--,x,s,g^)
pylab.show()
which produces the plot:
1.0

0.5

0.0

0.5

1.0
0

50

100

150

200

250

300

350

From the code, you can probably figure out that a line style of r is a red dashed
line, and g^ are green triangles. There are many other attributes that can be
controlled: linewidth, dash style, etc. and I invite you to check out the matplotlib
documentation.
By now, you should understand enough about classes, keyword argument passing
and other pythonalia to be able to figure things out on your own. But dont panic,

6.12. MATPLOTLIB

185

Im going to lead you through a few more examples, which I hope will speed you on
your plotting way.

6.12.2

Multiple figures and more customization

As already mentioned, pylab has the concept of current figure which subsequent
commands refer to. In the preceding examples, we only had one figure, so we didnt
have to name it, but for fancier figures with several plots, we can create named
figure objects by invoking a figure instance:
fig = pylab.figure(num=1,figsize=(5,7)).
Notice the syntax whereby figsize is a method with width and height (in inches)
specified by a tuple and fignum is the figure number. Notice that these are keyword
arguments, and that there are many more: consult the list of **kwargs in the online
documentation located here:
http://matplotlib.sourceforge.net/api/pyplot api.html#matplotlib.pyplot.figure
Once we have a figure instance (sometimes called a container), we can do all
kinds of things, including adding subplots. To do this, we can use the syntax:
fig.add_subplot(211)

Here the argument 211 means 2 rows, one column and this is the first plot. To make
plots side by side, you would use: fig.add subplot(121) for 1 row, two columns, etc.
After each add subplot command, that subplot becomes the current figure for
plotting on.

If you want more freedom, say, you want to make a subplot at

an arbitrary place, use the add axes([left, bottom,width, height]) 0 method, e.g.,
add axes([0.1,0.1,0.7,0.3]). The values are 0-1 in relative figure coordinates.
To illustrate these new concepts, consider the example code, matplotlib3.py:
#!/usr/bin/env python
import matplotlib
matplotlib.use("TkAgg")
import pylab, numpy
def f(t):
return numpy.exp(-t)*numpy.cos(2.*numpy.pi*t)
t1= numpy.arange(0.,5.,0.1)
t2= numpy.arange(0.,5.,0.02)
fig=pylab.figure(num=1,figsize=(7,5))

186

CHAPTER 6. LISAS PYTHON NOTES

fig.add_subplot(211)
pylab.plot(t1,f(t1),bo)
pylab.plot(t1,f(t1),k-)
fig.add_subplot(212)
pylab.plot(t2,numpy.cos(2*numpy.pi*t2),r--)
pylab.xlabel(Time (ms))
fig.add_axes([.6,.75,.25,.10])
pylab.plot([0,1],[0,1],r-,[0,1],[1,0],r-)
pylab.ylabel(Inset)
pylab.show()
which produces:
1.0
Inset

0.8
0.6
0.4
0.2

1.0
0.8
0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0

0.0
0.2
0.4
0.6
0.8
0

1.0

0.5

0.0

0.5

1.0
0

Time (ms)

By now, you should be able to figure out what everything in that script does by
yourself!

6.12.3

Adding text

We already met xlabel and ylabel. But text can be added in a other ways, e.g., using
the title, text, legend and arrow methods. Lets decorate one of our early examples
to show how some of these things work:
#!/usr/bin/env python
import matplotlib
matplotlib.use("TkAgg")
import pylab,numpy
x=numpy.arange(0,360,10)

6.12. MATPLOTLIB

187

r=x*numpy.pi/180.
c=numpy.cos(r)
s=numpy.sin(r)
s2=numpy.sin(r)**2
pylab.plot(x,c,r--,x,s,g^,x,s2,k-)
pylab.title(Fun with trig)
pylab.text(250,-.5,pithy note)
pylab.legend([cos(x),\
sin(x),r$\sin(x^2$)],lower left)
pylab.xlabel(r$\theta)
pylab.annotate(triangles!,\
xy=(175,0),xytext=(110,-.25),\
arrowprops=dict(facecolor=black,\
shrink=0.05))
pylab.show()
which produces this plot:
Fun with trig

1.0

0.5

0.0

triangles!

pithy note

0.5

cos(x)
sin(x)

sin(x )
2

1.0
0

50

100

150

200

250

300

350

$\theta

The title appears at the top of the plot. Text labels get places at the x and
y coordinates on the plot and the legend will appear in the upper/lower right/left
corner as specified in the string. The pylab.text(x,y,string, kwargs) method also has
optional key word arguments, specifying font, size, color and the like. The legend
labelist is a list of labels for each plot element. So, every line or point style that
you want in your legend, append a label to the label list after the relevant plot
command. Also note that the legend and xlabel methods use a special format for
strings (rLateX String) which allows embedded LaTeX equation syntax to make
scientific equations look right - so now you have to learn LaTeX!. Finally, the arrow
gets drawn with the annotate method, which has a lot of other attributes as well.

188

CHAPTER 6. LISAS PYTHON NOTES

Check the matplotlib documentation for details.


There are lots of graphing styles possible with matplotlib, e.g., histograms, pie
charts, contour plots, whisker plots, etc. Im just going to show you a few examples.
The best thing to do is to look through the online documentation for a plot that
looks like what you need, then modify it. This is ALWAYS a good approach - start
with something that works and fiddle with it until it suits your own particular needs.
ASSIGNMENT P6
The file SPECMAP.dat has 18 O data versus age (in ka). For those of you who
dont know, this is classic record of Imbrie et al., 1984 variations in oxygen isotopic
ratio of seawater controlled by changes in global ice volume which in turn was forced
by the Earths orbit. Write a Python program that :
reads in the data and
makes a simple X,Y plot with symbols connected by lines
put Age on the X axis and 18 O on the Y. -the LaTeX code for 18 O is:
$\delta^{18}O$
send the script to me at ltauxe@ucsd.edu

6.12.4

Histograms

I downloaded a weeks worth of earthquake location, magnitude etc. from the website:
http://earthquake.usgs.gov/earthquakes/catalogs/index.php
by clicking on the XML merged catalog, past 7 days link.
This a compressed (gnu zip) XML file. After unzipping it (by clicking on it), the
file (called merged catalog.xml looked something like this:
<?xml version="1.0" encoding="UTF-8"?>
<merge>
<event id="71672091" network-code="NC"
time-stamp="2011/11/03_23:03:32 " version="2">
<param name="year" value="2011"/>
<param name="month" value="10"/>
<param name="day" value="28"/>
<param name="hour" value="21"/>
<param name="minute" value="21"/>
<param name="second" value="52.7"/>
<param name="latitude" value="38.8147"/>
<param name="longitude" value="-122.7862"/>
<param name="depth" value="1.7"/>
<param name="magnitude" value="0.2"/>

6.12. MATPLOTLIB

189

<param name="num-phases" value="8"/>


<param name="dist-first-station" value="0.0"/>
<param name="rms-error" value="0.00"/>
<param name="hor-error" value="0.8"/>
<param name="ver-error" value="0.6"/>
<param name="azimuthal-gap" value="42"/>
<param name="magnitude-type" value="D"/>
<param name="magnitude-type-ext" value="Mcd = coda duration magnitude"/>
<param name="num-stations-mag" value="3"/>
<param name="stand-mag-error" value="0.1"/>
<param name="location-method" value="h"/>
<param name="location-method-ext" value="Hypoinverse (Confirmed by human review)"/>
</event>
<event id="71672096" network-code="NC"
time-stamp="2011/10/28_21:30:19 " version="0">
<param name="year" value="2011"/>
<param name="month" value="10"/>
etc.
Reading in all the data, I can plot them various ways. In this example, I plot a
histogram of the magnitudes:
#!/usr/bin/env python
import matplotlib
matplotlib.use("TkAgg")
import pylab
def readEQs(infile):
input=open(infile,rU).readlines()
EQs=[] # list to put EQ dictionaries in
linenum=0
while linenum <len(input):
if event id in input[linenum]: # new event
EQ={} # define a dictionary
linenum+=2 # increment past time-stamp
while param in input[linenum]:
record=input[linenum].split(=)
datakey=record[1].split()[0].strip(")
EQ[datakey]=record[2].strip(\n).strip(/>).strip(")
linenum+=1 # keep going until </event>
if </event> in input[linenum]: # done with event
EQs.append(EQ)
linenum+=1 # look for next event id
return EQs
EQs=readEQs(merged_catalog.xml)
Magnitudes=[] # set up container
for eq in EQs: # step through earthquake dictionaries
Magnitudes.append(float(eq[magnitude])) # collect magnitudes
pylab.hist(Magnitudes,bins=50,normed=True) # plot em
pylab.xlabel(Richter Magnitude)
pylab.ylabel(Frequency)
pylab.show()

190

CHAPTER 6. LISAS PYTHON NOTES


0.8
0.7

Frequency

0.6
0.5
0.4
0.3
0.2
0.1
0.0
0

3
4
Richter Magnitude

In the example, please notice the clever way in which we parse the XML code.
First, we look for the new event marker, then split the record on its =. This gives
a element list with what we need. The second element contains the key and the
third element the value. To get the key name, we split on the space which puts the
key name (which puts the key name enclosed in quotes in the first element of a list,
which we select and strip off the quotes. This we use as the key in the dictionary,
EQ. To get the value, we use the third element in the record list (record[2]), strip
off the end of line character and the quotes. We pair the value with the key in the
EQ dictionary and continue, incrementing linenum until we have all the keys picked
off and hit the <event > line. When we are done with a dictionary, we append
it to a list of all the dictionaries, increment linenum and press on. We keep going
until we have read all the data.
After we have parsed the data file into a list of dictionaries, we make a list for the
thing we one to plot (Magnitudes), hunt through the list, picking out the magnitude
data and after turning it into a float, append it to the list. We plot the histogram
using the pylab.hist method. We can label the plot as per usual and display it with
show(). The dictionaries get appended to a list. This way, any particular key could
easily get fished out later for plotting. In this case, we just put the float of the
Magnitude column into the list Magnitudes. These get plotted in the histogram.

6.12. MATPLOTLIB

6.12.5

191

Pie Charts

I mostly think pie charts are silly, but some people love them. So if you want to see
data as a pie chart, say the fraction of earthquakes in each magnitude bin from the
last example, we can modify the script thusly:
#!/usr/bin/env python
import matplotlib
matplotlib.use("TkAgg")
import pylab
from readEQs import *
EQs=readEQs(merged_catalog.xml)
Fracs,Labels=[],[]
bin0=0
for m in range(1,8): # assume no magnitudes bigger than 8 last week!
num=0 # initialize count
for eq in EQs:
eqm=float(eq[magnitude])
if eqm<m and eqm>bin0:num+=1 # count all magnitudes in this bin
Fracs.append(float(num))
Labels.append(str(bin0)+-+str(m))
bin0=m # increment to next bin
pylab.pie(Fracs, labels=Labels)
pylab.axis(equal) # make the pie round!
pylab.title(Silly Pie Chart)
pylab.show()

Silly Pie Chart 0-1

6-7
5-6
4-5
1-2

3-4
2-3

Notice how the function readEQs from the histogram example has been put into a
module by itself and then called within this program.

192

CHAPTER 6. LISAS PYTHON NOTES

6.12.6

Basemap

Most of your maps can be made with GMT, but python can make maps too. Once
you know Python, it can be much easier to use GMT because the same principles
apply and all of the power of matplotlib is available for enhancing your plots. We
generate maps with a special toolkit of matplotlib called Basemap.
Here is a simple example, using our earthquake data from the histogram example
to make a mollweide projection with the earthquake locations as red dots.
#!/usr/bin/env python
import matplotlib
matplotlib.use("TkAgg")
import pylab,numpy
from readEQs import *
from mpl_toolkits.basemap import Basemap
EQs=readEQs(merged_catalog.xml) # read in data (see histogram example)
Lats,Lons=[],[] # set up lists for location data
for eq in EQs: # step through the list of earthquakes
Lats.append(float(eq[latitude])) # collect the latitudes (as floats)
Lons.append(float(eq[longitude]))
map=Basemap(projection=moll,lon_0=0,resolution=c) # create a map instance
map.drawcoastlines()
map.drawmapboundary()
map.drawmeridians(numpy.arange(0,360,30)) # draws longitudes from list
map.drawparallels(numpy.arange(-60,90,30)) # draws latitudes from list
X,Y=map(Lons,Lats) # calculates the projection of the X,Y
pylab.plot(X,Y,ro) # uses pylabs plot to plot these arrays
pylab.savefig(map.eps) # save the figure in EPS format

6.12. MATPLOTLIB

193

In this example, the list of earthquake dictionaries, EQs is read in with the same
file reading function from the histogram example. Then we fish out the latitude
and longitudes from each earthquake dictionary (eq) and append the floating point
equivalent (remember they are strings in the dictionary!) into the relevant lists
(Lats, Lons).
The basic concept of Basemap is that you create a map class instance with a call
to Basemap. The attributes of you map (e.g., resolution, projection, central longitudes, map boundaries, map center, etc.), i.e., all the things you set with switches
in pscoast are set with keyword arguments in the call to Basemap. The details of
these key word arguments depend on the projection you choose.
After you create the map instance (called map) in the above example, you can
modify attributes in using methods available to this particular subclass. To draw the
coastlines, use the method drawcoastlines(). To draw the lines of longitude we use
the method drawmeridians() with the an array (or list) containing the longitudes you
want to plot. In this case we generated the array with numpy.arange, but we could
have used range(0,360,30) or any arbitrary integer or floating point list. Drawing
the lines of latitude works the same way (with the drawparallels() method).
Now we want to plot a bunch of points on the map. By sending in the longitudes
and latitudes of the earthquake locations as arguments to the map class, the class
chews on them and spits out x,y values based on the projection we specified when
creating the map instance with Basemap. Now we can plot the returned X, Y lists
using the regular pylab plotting functions. We can go on and decorate the map
with anything available in pylab. The advantage of using Basemap over, say, GMT
is you can build on your knowledge of matplotlib, instead of struggling with two
completely different and incompatible plotting packages.
For more examples, check the documentation available here:
http://matplotlib.github.com/basemap/users/examples.html
ASSIGNMENT P7
Redo your GMT1 assignment using the Basemap toolkit!. Add a point labelled
Now Im here! and the location of SIO.

6.12.7

Contour plots

Of the many plotting methods available in matplotlib, one of the more useful for
Earth Scientists is the contour plot. Here is a example of the gravity anomaly

194

CHAPTER 6. LISAS PYTHON NOTES

created by a buried sphere:


#!/usr/bin/env python
import numpy
import matplotlib
matplotlib.use("TkAgg")
import pylab
G=6.67e-11 # grav constant in Nm^2/kg^2 (SI)
R=2. # radius in meters
z=3. # depth of burial
drho=500 # density contrast in kg/m^3
x=numpy.arange(-2.*z,2.*z,0.1)
y=numpy.arange(-2.*z,2.*z,0.1)
X,Y=pylab.meshgrid(x,y)
h=numpy.sqrt(X**2+Y**2)
g=(G*4.*numpy.pi*R**3.*drho)/(3.*(h**2+z**2))
pylab.imshow(g,interpolation=bilinear,cmap=pylab.cm.Spectral)
pylab.colorbar()
pylab.axis(equal)
pylab.show()

1e

1.20

20

1.05

0.90

40

0.75
60
0.60

80
0.45

100

0.30

0.15
0

20

40

60

80

100

120

The first new function for us in the above example is meshgrid. This makes 2D
arrays X and Y for a 3D mesh/surface plots. These have the dimensions of x by y.
Each row of X is the same as x and there is a row for every element in y. Similarly,
each column in Y is a copy of y and there is a column for every element in x.
The line h=numpy.sqrt(X**2+Y**2) calculates the horizontal distance h from
the origin for every point on the X,Y grid. Then we calculate the gravity anomaly
g for a spherical mass with radius (R) at depth of z with the command:
g=(G*4.*numpy.pi*R**3.*drho)/(3.*(h**2+z**2))

6.13. DEEPER INTO NUMPY AND SCIPY

195

You can find the formula for this in any geophysics text book. Anyway, now we
have g which if you print it out, looks like this:
[[

1.37971509e-08
1.42112058e-08
[ 1.40028722e-08
1.44295575e-08
[ 1.42112058e-08
1.46508813e-08
...,
[ 1.44221090e-08
1.48751394e-08
[ 1.42112058e-08
1.46508813e-08
[ 1.40028722e-08
1.44295575e-08

1.40028722e-08
1.40028722e-08]
1.42148210e-08
1.42148210e-08]
1.44295575e-08
1.44295575e-08]

1.42112058e-08 ...,

1.44221090e-08

1.44295575e-08 ...,

1.46470410e-08

1.46508813e-08 ...,

1.48751394e-08

1.46470410e-08
1.48751394e-08 ...,
1.46470410e-08]
1.44295575e-08
1.46508813e-08 ...,
1.44295575e-08]
1.42148210e-08
1.44295575e-08 ...,
1.42148210e-08]]

1.51063696e-08
1.48751394e-08
1.46470410e-08

It is a 2D array in which every element is the value of g at that grid point. There are
a bunch of ways to visualize this, but here we plot it as a contour plot. To do this we
interpolate between all the grid points and choose a color map translating the value
of g into a color from blue to orange. When choosing color maps, be aware that a
lot of people are red-green color blind and appreciate some other color contrast, like
the blue-orange one chosen here.

6.13

Deeper into NumPy and Scipy

Now we have some plotting skills under our belt, we can take advantage of the computational tools available in NumPy and another scientific package called Scipy. We
have already met these: cos(), sin(), pi(), arctan2(), arange(), among others. Now
meet polyfit(x,y,order), and polyval(coeffs,x). The former fits an n-order polynomial
to input data and returns a list of the coefficients and the latter evaluates the value
at x using coefficients returned from polyfit.
To find a best fit line (y = mx + b where m is the slope and b is the intercept) we
can use polyfit() by setting for order to one. Order is two for a quadratic polynomial,
(y = ax2 + bx + c) and three for cubic (y = ax3 + bx2 + cx + d). See how these work
in the following example:
#!/usr/bin/env python
import matplotlib
matplotlib.use("TkAgg")

196

CHAPTER 6. LISAS PYTHON NOTES

import numpy,pylab
from numpy import random
x,y=[1.,2.,3.,4.],[1.,1.8,3.4,4.2] # defines two lists
X=numpy.arange(x[0],x[-1]+.5,.1)
pylab.plot(x,y,ro)
coeffs=numpy.polyfit(x,y,1)
Y=numpy.polyval(coeffs,X)
pylab.plot(X,Y,r-)
y2=[3.42, 11.24, 19.86, 34.87]
pylab.plot(x,y2,bs)
coeffs2=numpy.polyfit(x,y2,2)
Y2=numpy.polyval(coeffs2,X)
pylab.plot(X,Y2,b-)
pylab.show()

45

40

35

30

25

20

15

10

0
1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

In the first call to the function polyfit(), it returns a list coeffs=[ 1.12 -0.2 ], with
the coefficients m and b; strangely it has no commas, so be careful. This list can be
passed directly into polyval(), which returns the y values evaluated at the positions
in the X list (or array) that is passed in. We plot these values as the red line,
compared to the original data points (x, y) which are red dots.
In the second call to the function polyfit(), we sent it new values for y (y2) and
asked for a second order polynomial fit. coeffs2 is [ 1.7975 1.3095 0.5925] which are
a, b, c respectively. This in turn gets churned through polyval() and plotted as the
blue curve (as compared to y2 which are the blue dots).

6.13. DEEPER INTO NUMPY AND SCIPY

6.13.1

197

More on slicing with arrays

We zoomed by the finer points of list and array indexing earlier in the chapter, eager
to get to plotting. Now we delve deeper.
First a review of lists. You will recall that in python, indexing starts with 0,
so for the list L=[0,2,4,6,8], L[1] is 2. The index of the last item is -1, so L[-1]=8.
To find out what the index for the number 4 is, for example, we have the index()
method: L.index(4), which will return the number 2. We actually already used
this method when we implemented command line arguments, but it wasnt really
explained. We know that to reassign a given index a new value we use the syntax
L[1]=2.5. And to use a part of a list (a slice) we use, e.g., B=L[2:4], which defines
B as a list with Ls elements 2 and 3 (4 and 6). And you also know that B=L[2:]
takes all the elements from 2 to the end. From these examples, you can infer that
the basic syntax for slicing is [start:stop:step]; if the step is omitted it is assumed
to be 1.
Arrays (and matrices) work in a similar fashion to lists, but these are multidimensional objects, so things get hairy fast.

The basic syntax is the same:

[start:stop:step], or i:j:k. but with Python arrays, we step through all the js for
each i at step k. This is best shown with examples:
>>> import numpy
>>> A=numpy.linspace(0,29,30)
>>> B=A.reshape(5,6)
array([[ 0.,
1.,
2.,
3.,
[ 6.,
7.,
8.,
9.,
[ 12., 13., 14., 15.,
[ 18., 19., 20., 21.,
[ 24., 25., 26., 27.,
>>> B[1:3,:-1:2]
array([[ 6.,
8., 10.],
[ 12., 14., 16.]])

4.,
10.,
16.,
22.,
28.,

5.],
11.],
17.],
23.],
29.]])

Lets pick about the statement B[1:3,:-1:2] to see if we can understand what it
does. The first part alone returns lines 2 and 3:
>>> B[1:3]
array([[ 6.,
[ 12.,

7.,
13.,

8.,
14.,

9.,
15.,

10.,
16.,

11.],
17.]])

Here j goes from [:-1], in other words, we all but the last element:
>>> B[1:3,:-1]
array([[ 6.,
7.,
[ 12., 13.,

8.,
14.,

9.,
15.,

10.],
16.]])

198

CHAPTER 6. LISAS PYTHON NOTES

And finally, we have the step of 2, which takes every other element:
>>> B[1:3,:-1:2]
array([[ 6.,
8.,
[ 12., 14.,

10.],
16.]])

For more on array slicing (indexing), see:


http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html

6.13.2

Looping through arrays

Earlier in the course, we learned that for loops with lists just step through item by
item. In n-dimensional arrays, they steps through row by row (like in slicing). For
example,
>>> for r in B:
...
print r
...
[ 0. 1. 2. 3. 4.
[ 6.
7.
8.
9.
[ 12. 13. 14. 15.
[ 18. 19. 20. 21.
[ 24. 25. 26. 27.

5.]
10.
16.
22.
28.

11.]
17.]
23.]
29.]

If you really want to step through element by element, you can use the ravel()
method which flattens an N-dimensional array to a single dimension:
>>> for e in B.ravel():
...
print e
...
0.0
1.0
2.0
3.0
etc.
For more on looping (or iterating), see:
http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html

6.13.3

Random Numbers

Many scientific investigations use techniques such as Monte Carlo simulations, or


bootstrap statistics, which you (hopefully) will learn about in future classes. These
applications require that you be able to generate random numbers from a variety

6.13. DEEPER INTO NUMPY AND SCIPY

199

of distributions. Python does this with the numpy.random module. To learn more,
read the documentation at:
http://docs.scipy.org/doc/numpy/reference/routines.random.html
Lets look at a few of these:
#!/usr/bin/env python
import matplotlib
matplotlib.use("TkAgg")
import pylab
from numpy import random
N,min,max=500,10,20
bins=(max-min)*2
Nums=[]
for n in range(N):
Nums.append(random.uniform(min,max))
pylab.hist(Nums,bins=bins,facecolor=orange)
pylab.title(Uniform distribution)
pylab.show()

Uniform distribution

35
30
25
20
15
10
5
010

12

14

16

18

20

The new twist here is the call to the uniform() method of the random() module
of NumPy. This returns uniformly distributed numbers between the min and max
values specified. The other embellishments were to the hist() module in matplotlib
by increasing the number of bins (the default is too few in my opinion) and to change
the color of the columns to a pretty shade of orange.

200

6.13.4

CHAPTER 6. LISAS PYTHON NOTES

Normal Distribution

There are dozens of other distributions available in the random module, but the all
time favorite is the normal, or Gaussian distribution. Here we have an example of
how to use it to retrieve numbers from a normal distribution with a mean of 10 and
a standard deviation of 2:
#!/usr/bin/env python
import matplotlib
matplotlib.use("TkAgg")
import pylab
from numpy import random
N,mu,sigma=500,10,2
Nums=[]
for i in range(N):
Nums.append(random.normal(mu,sigma))
pylab.hist(Nums,bins=20,facecolor=orange)
pylab.title(Normal distribution)
pylab.show()

Normal distribution

60
50
40
30
20
10
02

6.13.5

10

12

14

16

Statistics in Python

All this talk about normal distributions makes me hungry for some statistical analysis. There are a large number of statistical packages available in the Enthought
Python Distribution we are using for this class. Numpy has a number of useful
functions and there are many more hidden in the stats module of the Scipy package.

6.13. DEEPER INTO NUMPY AND SCIPY

201

For a summary of available functions, see: The documentation is still a bit thin, but
to get you started, see:
http://docs.scipy.org/doc/numpy/reference/routines.statistics.html
and
http://docs.scipy.org/doc/scipy/reference/stats.html
Here we will consider a few useful functions to give you a feel for how they work.
Lets start with 1D arrays and find the mean, standard deviation and sum. For
fun, we can use some of the fake data generated by the random.normal() function
and see how the actual means and standard deviations of a data set drawn from a
normal distribution compare with the true mean of and standard deviation .
From the normal distribution example, we generated a list of numbers drawn
from a distribution with = 10 and sigma = 2:
#!/usr/bin/env python
import numpy
from numpy import random
N,mu,sigma=500,10,2
Nums=[]
for i in range(N):
Nums.append(random.normal(mu,sigma))
ANums=numpy.array(Nums)
print numpy.mean(ANums), numpy.std(ANums),numpy.sum(ANums)
We wanted to calculate the mean, standard deviation and sum of our sample (Nums
using the functions numpy.mean(), numpy.std(), numpy.sum(). These work on arrays, not lists, so we first convert to an array: ANums=numpy.array(Nums). When
we run this code, we get:
10.0428129458 2.0027306608 5021.40647292
Note that you will get a different answer every time you run this because the random
sample really is pretty random. The mean and standard deviation here are 10.042....
and 2.0027, which are close to the true values of 10 and 2 respectively.
ASSIGNMENT P8:
Modify the statistics example above to calculate 1000 versions of Nums, with an

N of 10. Calculate the mean and standard error (standard deviation/ N ) values

202

CHAPTER 6. LISAS PYTHON NOTES

for each sample. Plot a histogram of the means. The standard error times 1.96
is the 95% confidence bound for the mean, i.e., the mean these bounds should
contain the true mean (10) 95% of the time. For what fraction of the 1000 samples
is this true?
We can also use the statistics methods on n-dimensional arrays. Here, the argument can specify the axis along which we want to do the calculation. Recalling the
arrays from before, we can illustrate the use of the numpy.sum() function as follows:
>>> A= numpy.array([[1,2,3],[4,2,0],[1,1,2]])
>>> A.sum(axis=0)
array([6, 5, 5])
>>> A.sum(axis=1)
array([6, 6, 4])

6.14

Graphical User Interfaces - GUIs

Having introduced you to the joys of command lines, why now a lecture on GUIs?
GUIs work differently than all the other scripts we have been writing in that they
dont just start at one end and proceed through to the end - the program flow can
be controlled by the user. GUIs let command line phobic people use your software.
They allows greater degree of interactivity with your data visualization. They can
streamline data analysis in an intuitive way. And, they are fun.
Okay, how do I make a GUI? GUIs are composed of a number of things like
text boxes, radio buttons, sliders, etc. called widgets. When the program starts, it
first builds the GUI then waits for something to happen (a menu to be selected or a
mouse click or some data to be entered.... ) - a state called an event loop. There are
a number of ways of creating GUIs in Python. The oldest and most standard way
is called Tkinter which is based on the even older UNIX language Tk. The newer
wxPython has now reached some maturity and comes standard with the Enthought
Python Distribution. see:
http://wiki.wxpython.org/Getting%20Started
http://zetcode.com/wxpython/
and
http://wiki.wxpython.org/wxPython%20by%20Example
It seems to be the way things are going, so well use wxPython in this class.

6.14. GRAPHICAL USER INTERFACES - GUIS

203

There are six essential parts to a wxPython GUI:


import wx
app = App()
OnInit() or init () function
frame = wx.Frame()
frame.Show()
app.MainLoop()

Imports the necessary wxPython package


App() is a sub class of wx.App
app is an instance thereof
Specify what happens on initialization.
Frame is a class for the main windows
frame is a particular window instance.
reveals the window - like pylab.show()
starts the progam

Here is a simple example that makes a window with an image in it:


Did you know this was called the shebang line?

STARTS THE MAIN EVENT LOOP


wxPython package
describes the application
subclass
initializes the App subclass
makes an image object
makes the main window
(frame in wxPython)

which makes this window:

describes the frame in


this application
prepares image for
display
initializes the frame and
displays the image

resizes the window


around the image
makes a status bar

204

CHAPTER 6. LISAS PYTHON NOTES


Elements of a GUI:
Widgets

Events

Dialogs and Messages


Layout
Input and Output

Things you put on your GUI like text boxes, buttons,


radio buttons, check boxes, pull down menus, etc.
see figure below.
Things that trigger actions like
mouse movements and clicks,
menu selections, button clicks, etc.
Pop up windows with information or action items.
Setting up your GUI with a nice layout
Getting data in and out.

Here are some common GUI widgets:

Check boxes
Buttons

List boxes

Menus

Text Entry

Radio Buttons

Common GUI widgets

From wxPython in Action, by Rappin - a great intro to GUIs.

The example is for a simple editor illustrating widgets, events, dialogs, and layout.

6.14. GRAPHICAL USER INTERFACES - GUIS

Make a container to
put things in

Make a box for layout


management
Make an text editing
window & add to box

which looks like this:

Now lets add some file I/O:

205

Dialog pop up returns


Make a button, bind it
to OnQuit & add to box button id as result, dies
Kills program, if result
Fix up layout a bit
is OK button

206

CHAPTER 6. LISAS PYTHON NOTES


Define some ID numbers for future use
Make a Menu object, append two options with a separator
Make a menuBar object, append the Menu object under
the label File and stick it on the top of the frame
Bind actions to each menu item ((e.g,
g OnOpen
p to ID_OPEN)
Gets the file name, opens it and reads it into the text
editor (named control)

which creates a text box and a file menu. You can open a file for editing (but saving
it costs extra.)

You can edit this,


but you cant save it.

6.14. GRAPHICAL USER INTERFACES - GUIS

6.14.1

207

Interacting with plots

In the section on statistics, I demonstrated a program to calculate gaussian data and


plot them as a histogram. The mean and standard deviation were hardwired into
the code. There are a bunch of ways we could make this program more customizable,
some of which use GUIs:
1. read in the mean and standard deviation using raw input() queries (annoying!)
2. read them in from a file using a redirect <
3. read them in as command line switches
4. read them in from Entry boxes using a GUI, construct a command and running
the program automatically through the command line as in the above ideas.
5. read them in from Entry boxes using a GUI, and call a method from within
the GUI
You should already know how to do the first three options from what you have
already learned. But here are a few examples for fun.
Option 1) Program queries with raw input():
#!/usr/bin/env python
import matplotlib
matplotlib.use("TkAgg")
import pylab,sys
import exceptions # new module that allows error trapping!
from numpy import random
"""random_2_i"""
N,mu,sigma=500,10,2
pylab.ion()
fig=pylab.figure(1,figsize=(5,4))
while 1:
Nums=[]
try:
# executes unless error!
mu=float(raw_input(Enter mean: ))
sig=float(raw_input(Enter sigma: ))
for i in range(N):
Nums.append(random.normal(mu,sigma))
pylab.hist(Nums,bins=20,facecolor=orange)
pylab.title(Normal distribution)
pylab.draw()
ans=raw_input("Press return to continue, q to quit: ")
if ans==q: break
pylab.clf() # clears figure so we can plot a new one

208

CHAPTER 6. LISAS PYTHON NOTES

except TypeError: # executes if there is a TypeError


print Invalid entries, try again
print Quitting
which produces this:

The program will keep updating the figure until you type q. There are two new
things in this code snippet, which could come in handy: 1) the pylab.clf() command,
which clears the figure and allows us to plot a new one and 2) error trapping with
the module exceptions. In this example, when you enter an invalid number, the
error trap syntax:
try:
execute some code if no error happens
except:
do this if you get an error - note you can specify the
error type.
will give you a warning and then go back up to the top of the while loop.
Option 2) Read from standard input
#!/usr/bin/env python

6.14. GRAPHICAL USER INTERFACES - GUIS

209

import matplotlib
matplotlib.use("TkAgg")
import pylab,sys
from numpy import random
"""random\_2\_file.py"""
fig=pylab.figure(1,figsize=(5,4))
pylab.title(Normal distribution)
line=sys.stdin.readline()
N,Nums=500,[]
params=line.split()
mu,sigma=float(params[0]),float(params[1])
for i in range(N):
Nums.append(random.normal(mu,sigma))
pylab.hist(Nums,bins=20,facecolor=orange)
pylab.show()
We can execute this program with a command like:
% random_2_file.py < random_file.dat

The file random file.dat contains the desired parameters.


Option 3) Command line switch
#!/usr/bin/env python
import matplotlib
matplotlib.use("TkAgg")
import pylab,sys
from numpy import random
"""random\_2\_switch.py"""
N,mu,sigma=500,10,2
pylab.ion()
if -mu in sys.argv: mu=float(sys.argv[sys.argv.index(-mu)+1])
if -sig in sys.argv: sig=float(sys.argv[sys.argv.index(-sig)+1])
Nums=[]
for i in range(N):
Nums.append(random.normal(mu,sigma))
pylab.hist(Nums,bins=20,facecolor=orange)
pylab.title(Normal distribution)
pylab.draw()
raw_input("Any key to quit")
This uses the already familiar use of command line switches and can be run with
a command like:
% random_2_switch.py -mu 10 -sig 2

210

CHAPTER 6. LISAS PYTHON NOTES

Option 4) Construct command line and run from within GUI:

#!/usr/bin/env python
import os,wx
""" gauss_GUI_cmd.py"""
ID_MEAN,ID_STD=101,102
class App(wx.App):
def OnInit(self):
self.frame=MyFrame(None,-1,Gauss command line)
self.frame.Center()
self.frame.Show()
return True
class MyFrame(wx.Frame):
def __init__(self, parent,id,title):
wx.Frame.__init__(self,parent,id,title,(-1,-1),wx.Size(300,300))
panel=wx.Panel(self,-1)
self.mean_label=wx.StaticText(panel,-1,"
Mean",wx.DLG_PNT(panel,15,5))
self.mean_entry=wx.TextCtrl(panel,ID_MEAN,"",wx.DLG_PNT(panel,40,5),\
wx.DLG_SZE(panel,80,12))
self.std_label=wx.StaticText(panel,-1,"
STD",wx.DLG_PNT(panel,15,20))
self.std_entry=wx.TextCtrl(panel,ID_STD,"",wx.DLG_PNT(panel,40,20),
wx.DLG_SZE(panel,80,12))
button=wx.Button(panel,-1,"Plot",wx.DLG_PNT(panel,40,45),wx.DLG_SZE(panel,25,12))
button.Bind(wx.EVT_BUTTON,self.PlotGauss)
def PlotGauss(self,event):
commandstring=random_2_switch.py -mu +self.mean_entry.GetValue()
+ -sig +self.std_entry.GetValue()
print running: ,commandstring
os.system(commandstring)
app=App(False)
app.MainLoop()

And we get:

6.14. GRAPHICAL USER INTERFACES - GUIS

211

Option 5) Embedding plot in GUI


This is the most elegant option of all and makes us real GUI experts:
Basic Structure:
#!/usr/bin/env python
import matplotlib; matplotlib.use(TkAgg)
from matplotlib.backends.backend_wx import FigureCanvasWx,\
FigureManager, NavigationToolbar2Wx # import some matplotlib tools for wxPython
import numpy,pylab,wx
ID_MEAN,ID_STD,ID_PLOT=wx.NewId(),wx.NewId(),wx.NewId() # makes wxPython assign ids
class PlotFigure(wx.Frame): # initializes plot window
def __init__(self):
wx.Frame.__init__(self, None, -1, "matplotlib in wxFrame")
HERE WE INITIALIZE SOME STUFF LIKE BUTTONS AND
AND THE PLOT WINDOW
def PlotGauss(self, event):
HERE WE DO THE PLOTTING
def OnQuit(self,event): # quits nicely

212

CHAPTER 6. LISAS PYTHON NOTES


self.Destroy()

app = wx.PySimpleApp()
frame = PlotFigure()
frame.Show()
app.MainLoop()

# a quick start for simple apps.

Lets take a closer look at the PlotFigure class. This is where I design the basic
GUI. I want a plot window to stick my matplotlib figure. I want a toolbar under
my figure. I want some text entry boxes for putting in the mean and standard
deviation I want a button to trigger a new plot, and I want a button to quit the
program nicely. These all need to be laid out in a reasonable fashion.
So, first, I need to make some boxes to put things in. Actually, I need two
boxes for this: one to put all the text entry and plot widgets in and one to assemble
everything nicely in.
To create the entry and plot widgets and putting them in a box I can use the
following code:
box=wx.BoxSizer(wx.HORIZONTAL) # makes a box to put things in - horizontally
self.mean_label=wx.StaticText(self,-1,"
Mean") # makes a label
self.mean_entry=wx.TextCtrl(self,ID_MEAN,"") # makes a text entry box
self.mean_entry.SetValue(10.) # initializes the value to 10.
self.std_label=wx.StaticText(self,-1,"
STD")
self.std_entry=wx.TextCtrl(self,ID_STD,"")
self.std_entry.SetValue(2.)
self.plotbutton=wx.Button(self,-1,"Plot") # makes a button labeled Plot
self.plotbutton.Bind(wx.EVT_BUTTON,self.PlotGauss) # binds button to PlotGauss
self.quitbutton=wx.Button(self,-1,"Quit") # makes a button labeled Quit
self.quitbutton.Bind(wx.EVT_BUTTON,self.OnQuit) # binds button to OnQuit
box.Add(self.mean_label, 0, wx.GROW) # puts the mean label into the box
box.Add(self.mean_entry, 0, wx.GROW) # puts in the mean text entry box
box.Add(self.std_label, 0, wx.GROW) # puts in the std label
box.Add(self.std_entry, 0, wx.GROW) # puts in the std text entry box
box.Add(self.plotbutton, 0, wx.GROW) # puts in the plot button
Now lets work on the main box - here called sizer:
self.fig = pylab.figure(num=1,figsize=(5,4)) # make a figure instance
self.fig.add_subplot(111) # put a starter plot in it
self.PlotGauss(self) # make PlotGauss a method of this class
self.canvas = FigureCanvasWx(self, -1, self.fig) # make a canvas object, attach the figure
self.toolbar = NavigationToolbar2Wx(self.canvas) # make a toolbar
self.toolbar.Realize() # attach it to the canvas
tw, th = self.toolbar.GetSizeTuple() # get the size of the toolbar
fw, fh = self.canvas.GetSizeTuple() # get the size of the canvas
self.toolbar.SetSize(wx.Size(fw, th)) # set the width of the toolbar to the canvas

6.14. GRAPHICAL USER INTERFACES - GUIS

213

sizer = wx.BoxSizer(wx.VERTICAL) # make a box called sizer, things will add vertically
sizer.Add(self.canvas, 1, wx.LEFT|wx.TOP|wx.GROW) # put in the canvas
sizer.Add(box, 0, wx.GROW) # put in the widget box (with text entry boxes, etc.)
sizer.Add(self.quitbutton, 0, wx.GROW) # put in the quit button
sizer.Add(self.toolbar, 0, wx.GROW) # put in the toolbar
self.SetSizer(sizer) # resize things nicely
self.Fit() # make things fit in properly
And finally the plot function bit:
def PlotGauss(self, event):
pylab.figure(num=1) # set the current figure to #1
pylab.clf() # clear the current figure
mu=float(self.mean_entry.GetValue()) # retrieve mu, sigma from their boxes
sigma=float(self.std_entry.GetValue()) # and make them floats
N,Nums=500,[]
for i in range(N):
Nums.append(random.normal(mu,sigma))
pylab.hist(Nums,bins=20,facecolor=orange)
self.canvas.draw() # redraw the plot window
self.canvas.gui_repaint() # refresh the canvas frame with the new plot
Putting it all together we have:
#!/usr/bin/env python
import matplotlib; matplotlib.use(TkAgg)
from matplotlib.backends.backend_wx import FigureCanvasWx,\
FigureManager, NavigationToolbar2Wx
import numpy,pylab,wx
from numpy import random
"""Program gauss_GUI_embed.py"""
ID_MEAN,ID_STD,ID_PLOT=wx.NewId(),wx.NewId(),wx.NewId()
class PlotFigure(wx.Frame):
def __init__(self):
wx.Frame.__init__(self, None, -1, "matplotlib in wxFrame")
self.fig = pylab.figure(num=1,figsize=(5,4))
self.canvas = FigureCanvasWx(self, -1, self.fig)
self.toolbar = NavigationToolbar2Wx(self.canvas)
self.toolbar.Realize()
tw, th = self.toolbar.GetSizeTuple()
fw, fh = self.canvas.GetSizeTuple()
self.toolbar.SetSize(wx.Size(fw, th))
sizer = wx.BoxSizer(wx.VERTICAL)
sizer.Add(self.canvas, 1, wx.LEFT|wx.TOP|wx.GROW)
box=wx.BoxSizer(wx.HORIZONTAL)
self.mean_label=wx.StaticText(self,-1,"
Mean")
self.mean_entry=wx.TextCtrl(self,ID_MEAN,"")
self.mean_entry.SetValue(10.)
self.std_label=wx.StaticText(self,-1,"
STD")
self.std_entry=wx.TextCtrl(self,ID_STD,"")
self.std_entry.SetValue(2.)

214

CHAPTER 6. LISAS PYTHON NOTES

self.plotbutton=wx.Button(self,-1,"Plot")
self.plotbutton.Bind(wx.EVT_BUTTON,self.PlotGauss)
self.quitbutton=wx.Button(self,-1,"Quit")
self.quitbutton.Bind(wx.EVT_BUTTON,self.OnQuit)
box.Add(self.mean_label, 0, wx.GROW)
box.Add(self.mean_entry, 0, wx.GROW)
box.Add(self.std_label, 0, wx.GROW)
box.Add(self.std_entry, 0, wx.GROW)
box.Add(self.plotbutton, 0, wx.GROW)
sizer.Add(box, 0, wx.GROW)
sizer.Add(self.quitbutton, 0, wx.GROW)
sizer.Add(self.toolbar, 0, wx.GROW)
self.SetSizer(sizer)
self.Fit()
self.fig.add_subplot(111)
self.PlotGauss(self)
def PlotGauss(self, event):
pylab.figure(num=1)
pylab.clf()
mu,sigma=float(self.mean_entry.GetValue()),float(self.std_entry.GetValue())
N,Nums=500,[]
for i in range(N):
Nums.append(random.normal(mu,sigma))
pylab.hist(Nums,bins=20,facecolor=orange)
self.canvas.draw()
self.canvas.gui_repaint()
def OnQuit(self,event):
self.Destroy()
app = wx.PySimpleApp()
frame = PlotFigure()
frame.Show()
app.MainLoop()
This gives a completed GUI that responds to the text entry and replots the
histogram when the plot button is clicked. It also quits nicely when you click on
quit.

6.14.2

Event Handling in matplotlib

In the section on matplotlib, we learned about the pylab module. Behind the scenes,
pylab is an interface to three layers in matplotlib: the FigureCanvas (the area onto
which the figure gets drawn), the Renderer (the thing that does the drawing) and
the Artist (controls the Renderer to paint on the FigureCanvas). Up to now, we used
pylab handle the details for us. To take control of the plots (placement of figures,
fonts, tickmarks, axes....) you need to know more about Artist containers (Figure, Axis, and Axes) and things that get drawn in them called primitives (lines,

6.14. GRAPHICAL USER INTERFACES - GUIS

215

rectangles, text, images). For a nice tutorial look in the matplotlib documentation:
http://matplotlib.sourceforge.net/users/artists.html#artist-tutorial
Artist
Artist objects (like lines, tick marks, axes, text) are all configurable. When you use
a command like add axes or add subplot you create an Axes instance. Remember
how we made an Axes instance and called ax?
fig=pylab.figure()
ax=fig.add_subplot(111)
Every time you plot something on fig with,e.g., the plot command, you create an
object of the Line2D class (yes even points). pylab keeps a record of each of these
plot instances by adding to a list associated with the Axis instance and retrieved by
the method lines (e.g., ax.lines is the list of plotting calls on the Axes instance ax).
BTW: This is how the legend commanworks, if you remember. So by identifying
the line you want in the list, you can change its attributes (e.g., color, linewidth,
linestyle or marker). I realize that was way more than you wanted to know right
now, but we will need some idea of what FigureCanvas does in the following.
Line color
#!/usr/bin/env python
import matplotlib
matplotlib.use("TkAgg")
import pylab, numpy
""" Program linecolor.py"""
pylab.ion() # makes plot interactive
fig=pylab.figure() # makes a figure instance
ax=fig.add_subplot(111) # Axes instance
t=numpy.arange(0,1,.01)
s=numpy.sin(2*numpy.pi*t)
c=numpy.cos(2*numpy.pi*t)
ax.plot(t,s,color=blue,lw=2) #Line2D instance
ax.plot(t,c,color=magenta,lw=2)
pylab.draw()
print ax.lines # prints all your plot instances
print last line color: ,ax.lines[-1].get_color()
raw_input("Any key to change last line to red ")
ax.lines[-1].set_color(red) # sets last line to red
pylab.draw()
raw_input()

216

CHAPTER 6. LISAS PYTHON NOTES

Mouse events
Earlier, we learned how to make GUIs using wxPython and even managed to embed
a matplotlib figure into a wxPython Frame. But we couldnt interact with the
plot directly, say by clicking on an individual box in the tictactoe example and
have the program place an X or an O there. To do this, we need the program to
recognize key or mouse events and return these to the program. In wxPython we had
events (EVT BUTTON) which were connected to callback functions (like OnQuit)
using the method BIND. In matplotlib we have a similar event button press event
which can be connected to a function (e.g., onclick) using the FigureCanvas method
mpl connect, e.g.:
fig.canvas.mpl_connect(button_press_event, \
onclick)

Mouse events are the most common type of interaction with plots. We can use
them to identify data points, say in a digitizer, or to flag them as bad, or pick
them as special (e.g., P wave arrival, stratigraphic tie point, start or end point for
a calculation) matplotlib supports several mouse events:
Event Name
Description
button press event:
mouse button pressed
button release event mouse button is released
motion notify event mouse action
mouse scroll wheel is rolled
scroll event
pick event
an Artist object is selected
Here is an example of a button press event:
#!/usr/bin/env python
""" Program onclick.py"""
import matplotlib, numpy
matplotlib.use("TkAgg")
import pylab
from matplotlib.backend_bases import FigureCanvasBase # imports canvas tools
pylab.ion()
fig=pylab.figure()
ax=fig.add_subplot(111)
data=numpy.random.rand(10) # get 10 random numbers
ax.plot(data)
ax.plot(data,ro)
pylab.draw()
def onclick(event):
print button=%d, x=%d, y=%d, xdata=%f,\
ydata=%f%(event.button, event.x, event.y, \

6.14. GRAPHICAL USER INTERFACES - GUIS

217

event.xdata, event.ydata)
cid=fig.canvas.mpl_connect(button_press_event,\
onclick) # connect the button press to the function onclick
raw_input() #pauses the program
You could combine this with the editor we wrote before to make a digitizer!
In the last example, we just identified the location of the mouse click, but didnt
identify any particular plot object. In principle, each object (line, text, rectangle,
axes) could be picked. There is a catch however. When you create the object, you
have to do these things too:
1. set the picker to True (and usually some floating point tolerance).
2. connect the pick event to some action
3. define the action.
Consider the following program:
#!/usr/bin/env python
import matplotlib; matplotlib.use("TkAgg")
import pylab,numpy
class LineColor:
"""connects the picker to an Artist Line2D object and changes line color"""
def __init__(self,line):
self.line=line
self.connect=\
self.line.figure.canvas.mpl_connect(\
pick_event,self.on_pick)
def on_pick(self,event): # finds right line
if event.artist!=self.line: return
self.line.set_color(red) # makes red
self.line.set_linewidth(2) # makes fatter
self.line.figure.canvas.draw() # redraws line
def main():
"""
Program clickme.py
Plots some lines that are clickable, connects them to the LineColor action.
"""
fig=pylab.figure()
ax=fig.add_subplot(111)
t=numpy.arange(0,1,.01)
s=numpy.sin(2*numpy.pi*t)
ax.plot(t,s,color=blue,picker=True)
ax.plot(t+.25*numpy.pi,s,color=magenta,picker=True)
ax.plot(t+.5*numpy.pi,s,color=cyan,picker=True)
lines=[] # makes a list to store clickable line objects

218

CHAPTER 6. LISAS PYTHON NOTES

for line in ax.lines: # steps through list of plot objects


ln=LineColor(line) # make the line a LineColor objects
lines.append(ln) # stores in the list
pylab.show()
main()
The class LineColor is the action that happens when a plot object gets clicked on.
On initialization, it connects the click action to the function on pick which turns
the object red and fattens it up a bit. The main program draws some lines. Each
plot instance gets stored in the list ax.lines. Then the objects in ax.lines get turned
into clickable LineColor objects. When you click on a line, it turns red and fattens
up a bit.
Your final is to do a tic-tac-toe program, and it would be fun to make it work
inside a GUI. To get you most of the way there, I wrote a really silly Tic-Tac-Toe
program. It is not a GUI, just a clickable matplotlib window:
#!/usr/bin/env python
import matplotlib
matplotlib.use("TkAgg")
import pylab
import numpy,sys,exceptions
from numpy import random
def finish(who,xline,yline):
""" Check who won"""
if len(xline)>0:
if who==0:
# player wins
pylab.plot(xline,yline,r-)
print you have won!
else: # computer wins
pylab.plot(xline,yline,g-)
raw_input(you have lost to a stupid computer!)
pylab.draw()
cid = fig.canvas.mpl_connect(button_press_event, quit) # quit on mouse click
print click anywhere on plot to quit
def quit(event): # graceful exit
sys.exit()
def winner(who,myboxes):
""" checks to see if boxes selected have won """
# check for diagonals first
if 11 in myboxes and 22 in myboxes and 33 in myboxes: \
finish(who, [.5,2.5],[.5,2.5])
if 13 in myboxes and 22 in myboxes and 31 in myboxes: \
finish(who, [.5,2.5],[2.5,.5])
# check for rows)
if 11 in myboxes and 21 in myboxes and 31 in myboxes: \

6.14. GRAPHICAL USER INTERFACES - GUIS


finish(who, [.5,2.5],[.5,.5])
if 12 in myboxes and 22 in myboxes
finish(who, [.5,2.5],[1.5,1.5])
if 13 in myboxes and 23 in myboxes
finish(who, [.5,2.5],[2.5,2.5])
# check for columns)
if 11 in myboxes and 12 in myboxes
finish(who, [.5,.5],[.5,2.5])
if 21 in myboxes and 22 in myboxes
finish(who, [1.5,1.5],[.5,2.5])
if 31 in myboxes and 32 in myboxes
finish(who, [2.5,2.5],[.5,2.5])

219
and 32 in myboxes: \
and 33 in myboxes: \
and 13 in myboxes: \
and 23 in myboxes: \
and 33 in myboxes: \

def onclick(event): # if someone clicks in a square, return x,y


""" what to do for mouse clicks"""
x,y=event.xdata,event.ydata # assign mouse click to x and y
if y<1: # first row
if x< 1: # box 1,1
xtext,ytext= 1,1
elif x<2: # box 2,1
xtext,ytext= 2,1
else: # box 3,1
xtext,ytext= 3,1
elif y<2: # second row
if x< 1: # box 1,2
xtext,ytext= 1,2
elif x<2: # box 2,2
xtext,ytext= 2,2
else: # box 3,2
xtext,ytext= 3,2
else: # third row
if x< 1: # box 1,3
xtext,ytext= 1,3
elif x<2: # box 2,3
xtext,ytext= 2,3
else: # box 3,3
xtext,ytext= 3,3
## check if legal move (if already taken!)
box=str(xtext)+str(ytext) # make the box name
if box not in Xs and box not in Os: # box not yet taken
pylab.text(xtext-.5,ytext-.5,x,fontsize=24,color=red) # put a red x in
print \n your move: box id , box, \n
Xs.append(box)
del boxes[boxes.index(box)] # delete box from boxes
winner(0,Xs) # check if winner, 0 is player
## pick a box at random for the computers move!
ind = random.randint(0,len(boxes)-1) # pick a box at random
pylab.text(int(boxes[ind][0])-.5,int(boxes[ind][1])-.5,o,\
fontsize=24,color=green) # put green o in box
Os.append(boxes[ind])
winner(1,Os) # check if winner, 1 is computer

box

220

CHAPTER 6. LISAS PYTHON NOTES

del boxes[ind] # delete box from boxes


## check if computer won
#
else: # box already taken
print " \n box ",box, is taken!, choose another one\n
### now figure out computers move! c
def main():
""" tictactoe2.py program"""
global fig,Xs,Os,boxes # makes these have global scope
pylab.ion()
boxes=[11,12,13,21,22,23,31,32,33] # some box labels
Xs=[] # list of all x moves
Os=[] # list of all o moves
fig = pylab.figure() # create figure instance
ax = fig.add_subplot(111) # make a subplot
ax.plot([0,3,3,0,0],[0,0,3,3,0],k-) # draws a box
ax.plot([0,3],[1,1],k-) # make black lines
ax.plot([0,3],[2,2],k-)
ax.plot([1,1],[0,3],k-)
ax.plot([2,2],[0,3],k-)
ax.text(0.1,0.1,11) # labels boxes with their ids
ax.text(0.1,1.1,12)
ax.text(0.1,2.1,13)
ax.text(1.1,0.1,21)
ax.text(1.1,1.1,22)
ax.text(1.1,2.1,23)
ax.text(2.1,0.1,31)
ax.text(2.1,1.1,32)
ax.text(2.1,2.1,33)
pylab.draw() # draw the canvas
cid = fig.canvas.mpl_connect(button_press_event, \
onclick) # tell what do do for mouse clicks
raw_input() # make program wait for response
main() # run main program
ASSIGNMENT P9:
You have been writing Fortran code to play tic-tac-toe. For this (final) assignment, write a program that calls on your F90 functions to calculate the computers
moves instead randomly choosing an empty box. Tie these in to the Tic-Tac-Toe
program (in the onlick() function) so you can play a more challenging game! Now
put the whole thing inside a GUI.

6.15

3D plotting with Python

Contour plots are really just a way to visualize something that is inherently 3D on
a 2D surface. Think about a topographic map - the contour intervals are elevations
and our brains can reconstruct the 3D world by looking at the contours on the map.

6.15. 3D PLOTTING WITH PYTHON

221

But with computers we can visualize the 3D world in a more realistic manner. There
are lots of 3D plotting packages, and even within Python there are several different
approaches, one using a 3D toolkit of matplotlib that uses the same logic as for
regular matplotlib. For more on this module, see:
http://matplotlib.sourceforge.net/mpl toolkits/mplot3d/index.html
But for more 3D horsepower, there is a module called mlab, which is part of the
enthought.mayavi module. See:
http://github.enthought.com/mayavi/mayavi/mlab.html
And then there is Mayavi itself, which comes with the Enthought Python Edition.
This was way beyond what I know, but if you are curious, check out this website:
http://github.enthought.com/mayavi/mayavi/examples.html
Lets start with a 3D version of the gravity anomaly of a buried sphere:

#!/usr/bin/env python
import matplotlib
matplotlib.use("TkAgg")
import pylab,numpy
from mpl_toolkits.mplot3d import axes3d
G=6.67e-11 # grav constant in Nm^2/kg^2 (SI)
R=2. # radius in meters
z=3. # depth of burial
drho=500 # density contrast in kg/m^3
x=numpy.arange(-2.*z,2.*z,0.1)
y=numpy.arange(-2.*z,2.*z,0.1)
X,Y=pylab.meshgrid(x,y)
h=numpy.sqrt(X**2+Y**2)
g= (G*4.*numpy.pi*R**3.*drho)/(3.*(h**2+z**2))
fig=pylab.figure()
ax=axes3d.Axes3D(fig)
ax.plot_surface(X,Y,g)
ax.set_xlabel(X)
ax.set_ylabel(Y)
ax.set_zlabel(Z)
pylab.show()

222

CHAPTER 6. LISAS PYTHON NOTES

1.2
1.0
0.8

0.6
0.4
0.2

2
X

In this example, we import the axes3d module from the mplot3d toolkit of
matplotlib. We create an Axes3D instance called ax from the figure object, fig.
Axes3D objects have lots of methods, one of which is plot surface, which plots
a wireframe surface on the meshgrid X,Y of the data in g. Other methods are
set xlabel() and so on. Note that when you try this example on your own computer,
you can twirl the plot around to see it from various perspectives. You can save any
of these with the little disk icon or with pylab.savefig() as before.
To give you a flavor of what mlab can do, here is essentially the same code, but
using functions from mayavi.mlab:
#!/usr/bin/env python
import numpy
from enthought.mayavi import mlab
def g(X,Y):
G=6.67e-11 # grav constant in Nm^2/kg^2 (SI)
R=2. # radius in meters
drho=500 # density contrast in kg/m^3
h=numpy.sqrt(X**2+Y**2)
return 1e8*(G*4.*numpy.pi*R**3.*drho)/(3.*(h**2+z**2))
z=3. # depth of burial
X,Y=numpy.mgrid[-2.*z:2.*z:0.1,-2.*z:2.*z:0.1]
mlab.figure(bgcolor=(1,1,1))
s=mlab.surf(X,Y,g)
mlab.show()

6.15. 3D PLOTTING WITH PYTHON

223

The call to mlab.figure creates a figure instance instance with the background
color (bgcolor) set to white. In mlab, color gets set with the familiar r,g,b but here
the colors run from 0 (black) to 1 (full strength), so a color of (1,1,1) is white,
(1,0,0) is red, and so on. The default is for a black background. Then the surface
gets drawn with a call to mlab.surf().

There are lots more 3D plotting functions available in the two packages described
here. To whet your appetite, Ive picked out a few:

224

CHAPTER 6. LISAS PYTHON NOTES

scatter3d()

plot()

plot_wireframe()

bar3d()

PolyCollection()
contour()

plot_surface()

Examples from mplot3d.Axes3D examples


http://matplotlib.sourceforge.net/examples/mplot3d/

imshow()

plot3d()
Magnetic
field lines?

color
contour plots

points3d()
pts in space

flow()

surf()
contour_surf()
topography,
anomalies...

barchart()

contour plots

mesh()
spherical
harmonics

triangular_mesh()

no idea....
tessellation
grids

flow fields

quiver3d()

isopicnals or
isothermals

currents, winds
magnetic
moments

contour3d()
Examples from the Mlab gallery
http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/auto/examples.html

6.15. 3D PLOTTING WITH PYTHON

225

Here are some considerations for you to help you decide which way you want to
go:
mplot3d versus mlab
mplot3d (matplotlib)
mlab (mayavi)
Pros:
Pros:
mplot3d is a natural extension of pylab Prettier
it is easier to learn
Interactivity
pylab functions work for mplot3d too
More functions
Can be animated :)
Cons
Cons
Limited plotting styles
no svg output
ps and eps are buggy
slower
harder to learn for pylab masters

6.15.1

Geoscience applications

There wouldnt be much point in learning how to program in a geoscience class if


there were no practical applications. In this section I will very briefly point out a few
ways beyond the obvious X,Y plots and maps that you have already encountered.
Lines of flux
Professor R.L. Parker (our hero and professor emeritus of the SIO department)
wrote a Fortran (f77) program called force.f. This was a slight modification of his
magmap program that calculates magnetic field vectors given a geomagnetic reference model (see Bobs software website http://igppweb.ucsd.edu/parker/Software/).
The program force.f has disappeared from Professor Parkers website in the mean
time but is available on the class website at:
http://mahi.ucsd.edu/class233/force.zip
You run it with a session like this:
it creates a file draws the magnetic lines of flux from the core outward and saves
the data to a file like this:
% force

=================

226

CHAPTER 6. LISAS PYTHON NOTES

Enter commands for spherical harmonic coefficients (? for help)

igrf 2005
lines 200
radius 0.547
output lines.f05
exec

===================
igrf 2005
lines 200
radius 0.547
output lines.f05
===================
Field evaluated for IGRF 2005
Equatorial radius (km)
6378.170
Polar radius
(km)
6356.910
Reference radius (km)
6371.200
Evaluation r/a
0.547
Maximum degree of accepted coefficients:
13
Coefficients normalized to radius: 0.54700
New field-line file opened in: lines.f05
Initial and terminal points
0.047
-0.049
0.998
0.054
-0.040
-0.071
0.066
-0.995
0.532
0.164
0.226
0.217
-0.950
0.132
0.266
-0.272
0.686
-0.674
-0.160
0.516
0.083
0.987
-0.139
0.123
0.945
0.487
-0.385
-0.784
0.479
-0.382
etc.
0.016
0.856
-0.516
0.105
0.632
0.642
0.316
-0.699
0.662
0.306
Field line coordinates written to lines.f05

0.998
-0.831
-0.955
0.842
0.302
-0.790
0.767
-0.684

=================
Enter commands for spherical harmonic coefficients (? for help)

quit

Magmap run complete

6.15. 3D PLOTTING WITH PYTHON

227

The red commands are typed by the user and then blue stuff is the program response.
This will create an output file lines.f05 which looks something like this:
0.047
0.054
-0.071
-0.054

-0.049
0.998
-0.040
0.998
2
50.00000
0.066
-0.995
0.070
-1.005

1.000
1.000
50.00000
1.000
1.009

50.00000

The first three columns are x, y, z on a magnetic flux line and the fourth is R in
units of core mantle boundary radii. Each field line is separated from the rest by an
entry with the number of points in the previous line and 3 50.000s Some of the
field lines go WAY out in space (100 CMB radii); well come back to this later.
Parker also provided a script (look) which chops off the parts of the lines that are
more than 4 radii away, projects the 3D lines onto a plane specified by the user and
saves the data in a new file. It also invokes Parkers most famous program plotxy
(which IS available on his website). Plotxy produces a postscript file mypost.
When run with the command look 45 75 lines.f05, we get:

The thought occurs, wouldnt this look cool in 3D? Here is a little script inspired
by the plot3d() example from the Mlab gallery.

228

CHAPTER 6. LISAS PYTHON NOTES

#!/usr/bin/env python
import numpy as np
from enthought.mayavi import mlab
lines=np.fromstring(open(lines.f05).read(),dtype=float,sep= )
Xs,Ys,Zs,Rs=lines[0:-4:4],lines[1:-3:4],lines[2:-2:4],lines[3:-1:4]
line=0
lx,ly,lz,lr=[],[],[],[]
mlab.figure(bgcolor=(1,1,1)) # sets the background to white
while line<len(Xs):
if Rs[line]<5: # truncates far away field lines
lx.append(Xs[line])
ly.append(Ys[line])
lz.append(Zs[line])
lr.append(Rs[line])
elif Rs[line]>=5 and Ys[line]!=50.: # detects the 50s
while Rs[line]>5 and Rs[line]!=50:
line+=1
x,y,z,r=np.array(lx),np.array(ly),np.array(lz),np.array(lr)
mlab.plot3d(x,y,z,r,colormap=Spectral)
lx,ly,lz,lr=[],[],[],[]
if Ys[line]==50. and line<len(Rs):
x,y,z,r=np.array(lx),np.array(ly),np.array(lz),np.array(lr)
mlab.plot3d(x,y,z,r,colormap=Spectral)
lx,ly,lz,lr=[],[],[],[]
line+=1
mlab.show()

which produces something like this (but in 3D you can wiggle it around - much
more fun!):

6.15. 3D PLOTTING WITH PYTHON

229

Eigenvectors
Linear algebra has a lot of applications in the geosciences. One of the most useful
tricks is to calculate what are called eigenparameters. Say you have a bunch of
points and want to calculate a best-fit line through them - but they are in three
dimensions. Or you want a best fit plane, say the fault surface through a bunch of
earthquakes. Or you want to know the principal axis of the moment of inertia tensor.
Or the orientations of a stress or strain tensors. Or the preferred orientation of
mineral grains or clasts in a sedimentary deposit. Or the directions of the anisotropy
of just about anything. And the list goes on. Here is an example that finds the
eigenvectors and eigenvalues of what is called the covariance matrix of a bunch of 3D
points. These could be the end points of unit vectors (directions), or point masses
in space, for example.

#!/usr/bin/env python
import numpy
from numpy import linalg
from enthought.mayavi import mlab
dat=open(points.xyz,rU).readlines()
x,y,z=[],[],[]
for line in dat:
rec=line.strip(\n).split()
x.append(float(rec[0]))
y.append(float(rec[1]))
z.append(float(rec[2]))
X,Y,Z=numpy.array(x),numpy.array(y),numpy.array(z)
T=numpy.array([[numpy.sum(X*X),numpy.sum(X*Y),numpy.sum(X*Z)],\
[numpy.sum(Y*X),numpy.sum(Y*Y),numpy.sum(Y*Z)],\
[numpy.sum(Z*X),numpy.sum(Z*Y),numpy.sum(Z*Z)]])
evals,evects=linalg.eig(T)
print principal axis: ,evects.transpose()[0], with variance of ,evals[0]
print major axis: ,evects.transpose()[1], with variance of ,evals[1]
print minor axis: ,evects.transpose()[2], with variance of ,evals[2]
pv=evects.transpose()[0]*3.
mlab.figure(bgcolor=(1,1,1))
mlab.points3d(X,Y,Z,color=(0,0,0),scale_factor=0.25,opacity=.5)
mlab.outline(color=(.7,0,0))
mlab.plot3d([pv[0],-pv[0]],[pv[1],-pv[1]],\
[pv[2],-pv[2]],tube_radius=0.1,color=(0,1,0))
mlab.show()

230

CHAPTER 6. LISAS PYTHON NOTES

This script opens a file called points.xyz which looks like this:
1.4844e+00
1.6928e+00
9.7893e-01
1.1998e+00
7.6767e-01
1.5916e+00
1.4537e+00
etc.

5.9620e-02 2.2025e+00
4.7284e-02 1.0015e+00
-2.7332e-01 6.6763e-01
-1.9863e-01 5.1122e-01
2.1443e-02 7.9248e-01
9.7349e-02 1.8151e+00
1.3741e-01 2.1087e+00

It then parses the data into lists of floating point variables. These get turned into
arrays. Then the sums of the products and squares get put into a coherence matrix
of the form:

x2
xy
xz

xy
y2
yz

xz
yz .
z2

The function linalg.eigs() returns the eigenvectors and eigenvalues of this T matrix. The largest (principal) eigenvalue corresponds to the principal eigenvector and
is the axis along which the variance (spread) is the greatest. The minor eigenvector
corresponds to the axis along which the variance is least. One feature of this
function is that the coordinates of the eigenvectors are along axis 0, so are the first
column of the evects array:
evects=
[[ 0.70464008 -0.70917141

0.02362748]

6.15. 3D PLOTTING WITH PYTHON


[-0.001072
[ 0.70956409

231

0.03223454 0.99947976]
0.70429883 -0.02195351]]

That is why to print out the vectors:


% points.py
principal axis: [ 0.70464008 -0.001072
0.70956409] with variance of 733.172618519
major axis: [-0.70917141 0.03223454 0.70429883] with variance of 30.501077006
minor axis: [ 0.02362748 0.99947976 -0.02195351] with variance of 8.55431560739
I take the transpose.
The principal eigenvector gets assigned to the array pv. he function points3d
plots the points a nice shade of grey (opacity low). mlab.outline(color=(.7,0,0))
puts a red box around the figure and mlab.plot3d plots the principal eigenvector as
a green tube. Nice.

232

CHAPTER 6. LISAS PYTHON NOTES

Chapter 7

Peters Python Notes


These notes will reflect my attempt to learn Python from Lisas lecture notes. My
strategy will be to write a series of programs that mimic the programs in my Fortran90 notes.
This assumes that Enthought Python is installed. To verify this, enter:
% which python
which should return: /Library/Frameworks/Python.framework/Versions/Current/bin/python
It is possible to run Python from the command line (like Matlab), in which
case it functions like a fancy calculator. However, it quickly becomes tiresome to
enter everything manually, so we will start immediately with Python scripts, which
execute a series of Python commands. These are text files that end in .py and can
be thought of as Python source code. However, unlike F90 source code they do not
need to be compiled before they are run. Python is similar to Matlab in this way.
However, both Python and Matlab are much slower to run than compiled languages
like Fortran and C.
My first program is called printmess.py:
#! /usr/bin/env python
# simple Python test program (printmess.py)
print test message
Provided one has execute permission on this file (chmod 755 printmess.py), this can
be run by entering:
% ./printmess.py
test message
%
233

234

CHAPTER 7. PETERS PYTHON NOTES

Notice that we need to start with ./ because . is not in our path (for security
reasons).
The first line MUST be:
#! /usr/bin/env python
so that the file is interpreted as Python. Unlike Fortran or C, you CANNOT start
with a comment line (try switching lines 1 and 2 and see what happens).
The second line is a comment line. Anything to the right of # is assumed to be
a comment (in Fortran ! serves the same function).
Notice that print goes by default to your screen. You can use single or double
quotes for the test message. You can get an apostrophe in your output by using
double quotes and quote marks by using single quotes, i.e.,
#! /usr/bin/env
# simple Python
print "The pump
print She said

python
test program 2 (printmess2.py)
dont work cuz the vandals took the handles"
"I know what it\s like to be dead"

produces:
% ./printmess2.py
The pump dont work cuz the vandals took the handles
She said "I know what its like to be dead"
%
In the second print statement, the backslash in front of the apostrophe is necessary
to prevent an error (try it).
ASSIGNMENT P1
Write a Python script to print your favorite pithy phrase.

7.1

How to multiply two integers

Here is a simple Python program to multiply 2 and 3 (multint.py):


#! /usr/bin/env python
a = 2
b = 3
c = a*b
print product = , c

7.1. HOW TO MULTIPLY TWO INTEGERS

235

The program uses three variables, the letters a, b and c. Variable names should
always start with a letter. The remaining characters can be any combination of
letters, numbers, underscores ( ), and dashes (-). Unlike Fortran, variables are
CASE SENSITIVE (x is different from X). Variables in Python can be of many
types, including real (floating point), integer, complex, string, and logical. However,
unlike C or Fortran their type is not explicitly defined. Instead it is determined by
the syntax upon first use, i.e.,
x = 2
defines x as integer
x = 2. defines x as real
x = 2 defines x as a string
Basic Python has a very limited number of math operations, which include +, -,
* (multiply), / (divide), ** (to power of), and % (remainder). To get more advanced
math operations, use the numpy module (see below).

7.1.1

Declaring variables

Consider the following program with a typo in line 4:


#! /usr/bin/env python
abeg = 2.1
aend = 3.9
adif = aend - abge
print adif = , adif
You had intended to type abeg but typed abge instead. When you run the
program, you get an error message:
Traceback (most recent call last):
File "./undeclared.py", line 4, in <module>
adif = aend - abge
NameError: name abge is not defined
This is a desirable feature of Python. You dont want the program to run
by assigning some arbitrary value to abge and giving you a wrong answer. Yet
many languages will do exactly that, including Fortran (we can avoid this potential
problem in Fortran by using the implicit none statement at the beginning of our
programs).

236

CHAPTER 7. PETERS PYTHON NOTES

7.1.2

Alternate coding options

There are always lots of ways to write the same program. Here is another way to
write the multint.py code (multint2.py):
#! /usr/bin/env python
a = 2; b = 3; print product = , a*b
Notice that more than one command can be included on a line if the commands are
separated by a semicolon (also OK in F90). This makes the code more like C. This
option is rarely a good idea because it makes the code harder to read. Unless you
have a really good reason to put more than one command on a line (saving space is
NOT a good reason!), I suggest that you never use semicolons in this way.

7.2

Numpy

Standard Python only includes basic arithmetic operators (+, -, *, /, ** (power),


and % (modulus operator, see gcc example below)). For sqrt, trig and other highorder math functions, we need to import the numpy module and then then import
the individual commands. For example (testnumpy.py):
#! /usr/bin/env python
import numpy
print pi = , numpy.pi
print sin(pi/6) = , numpy.sin(numpy.pi/6.)
produces:
% ./testnumpy.py
pi = 3.14159265359
sin(pi/6) = 0.5
Notice that, like almost all languages, the trig arguments are in radians.
It can be annoying to have to type numpy. lots of times. We can save typing by
importing numpy as a name that we assign, i.e.,
#! /usr/bin/env python
import numpy as np
#or anything else
print pi = , np.pi
print sin(pi/6) = , np.sin(np.pi/6.)
You can avoid having to type even the np. by importing everything:
from numpy import *

7.3. MAKING A TRIG TABLE USING A FOR STATEMENT

237

or just a few functions:


from numpy import pi, sqrt
in which case one can use pi and sqrt with no prefixes. However, this is NOT
recommended because one can lose track of where things come from. You code will
be clearer and less error prone if you include the prefix to show which functions and
variables are coming from numpy.

7.3

Making a trig table using a for statement

Any advanced programming language must provide a way to loop over a series
of values for a variable. In C, this is most naturally implemented with the for
statement. In FORTRAN this is done with the do loop. Python uses its own
version of the for loop.
Here is an example program that generates a table of trig functions:
#! /usr/bin/env python
import numpy as np
degrad = 180./3.1415927
for theta in range(0, 90, 1):
ctheta = np.cos(theta/degrad)
stheta = np.sin(theta/degrad)
ttheta = np.tan(theta/degrad)
print %5.1f %8.4f %8.4f %8.4f %(theta, ctheta, stheta, ttheta)
Notice the use of the variable degrad to convert from degrees to radians. Next comes
for theta in range(0, 90, 1):
This begins the for loop and contains a lot of interesting syntax. Unlike most
languages (e.g., C or Fortran), the lines that follow that are inside of the for loop
MUST be indented. Statements that expect a subsequent indentation level end
in a colon (:). For clarity, most people indent loops in other languages, but the
indentation is optional. Here the indentation is how Python knows what is inside
the loop and where the loop ends, because Python has NO STATEMENT TO END
THE LOOP (e.g., enddo in Fortran). The key point is that the 4 lines following
the for statement, must ALL BE INDENTED EXACTLY THE SAME. Tabs and
blanks are treated differently in Python, so two lines may appear to have the same
indentation but actually be different. Thus, it is safest to NEVER USE TABS so
that one can always can see exactly what the true indentation is.

238

CHAPTER 7. PETERS PYTHON NOTES


Within the for loop, theta will assume a series of specified values, in this case

given by range(0, 90, 1). Range is an integer function that in this case will assume
the values from 0 to 89 (!), step 1. Why 89? Because it is the last number less than
90. Youre right to think this makes no sense but it is typical of Python syntax; the
upper limit is always one more than the last value used. The 1 for step is optional
because 1 is the default step size, i.e., range(0, 90) would give the same numbers.
Note that because range is an integer function, you cannot enter real values as limits
to define theta as real in the for loop. This is consistent with F90, which does not
allow real do loops, although they were OK in older versions of Fortran. Loops
with real increments are not considered good programming practice because of the
precision problems that repeated sums can cause.
Next comes
ctheta = np.cos(theta/degrad)
This uses the numpy cosine function. Ideally theta should be a real variable in this
expression, but fortunately Python figures out that int/real should be real (see more
discussion of this later). Better style is to write
ctheta = np.cos(float(theta)/degrad)
where float converts from integer to real (int converts from real to integer), or for
maximum efficiency we could write
rtheta_deg = float(theta)/degrad
ctheta = np.cos(rtheta_deg)
stheta = np.sin(rtheta_deg)
etc.
To make the output look nice, we do not use
print theta, ctheta, stheta, ttheta
which would space the numbers irregularly among the columns. Instead, we explicitly specify the output format using a format specification:
print %5.1f %8.4f %8.4f %8.4f %(theta, ctheta, stheta, ttheta)
Here the output format is given in the single quotes. The format for each number
follows the space to the right of the decimal point (in Fortran this is f5.1). The
single blank space between text there is reproduced exactly in the output, thus to
put commas between the output numbers, write:

7.3. MAKING A TRIG TABLE USING A FOR STATEMENT

239

print %5.1f, %8.4f, %8.4f, %8.4f %(theta, ctheta, stheta, ttheta)


Another way to write the print line is:
print %5.1f %8.4f %8.4f %8.4f \
%(theta, ctheta, stheta, ttheta)
Here we have used the continuation character to split the statement into two lines.
Notice that in this case the 2nd line does not have to be indented to match the
other lines, because this is all considered to be one line by the Python interpreter.
This example highlights a (minor) disadvantage of Python. Because it cares about
the spaces between the different formats, we cannot add spaces in the top line so
that the formats and their variables line up perfectly. Aligning things this neatly
is probably more trouble than its worth, but it certainly makes the code easier to
understand.

7.3.1

Numpy mathematical functions

We used the numpy sine, cosine and tangent function in the trigtable program. Here
are more numpy math functions:
absolute(x)
arccos(x)
arcsin(x)
arctan(x)
arctan2(y,x)
cos(x)
cosh(x)
exp(x)
log(x)
log10(x)
sin(x)
sinh(x)
sqrt(x)
tan(x)
tanh(x)

absolute value
arccosine
arcsine
arctangent
arctangent of y/x in correct quadrant (***very useful!)
cosine
hyperbolic cosine
exponential
natural logarithm
base 10 log
sine
hyperbolic sine
square root
tangent
hyperbolic tangent

Note that the arcsin, etc., functions have different names than Fortran and C (which
use asin, acos, etc.)

7.3.2

Possible integer/real problems

As an aside, note that the trigtable program uses:

240

CHAPTER 7. PETERS PYTHON NOTES


degrad = 180./3.1415927

rather than simply


degrad = 180/3.1415927
The reason is to make completely sure that the program will compute a real
quotient and not an integer quotient. In fact, this caution is not needed in this case,
as the following program demonstrates:
#! /usr/bin/env python
print 2/3 = , 2/3
print 2./3 = , 2./3
print 2/3. = , 2/3.
print 2./3. = , 2./3.
Running the program yields:
2/3 = 0
2./3 = 0.666666666667
2/3. = 0.666666666667
2./3. = 0.666666666667
As long as one part of the fraction is real, the program will compute a real
quotient. It is only when both numbers are written as integers that the result is
truncated. However, I have gotten into the habit of always including the decimal
point in real expressions to avoid someday accidentally writing something like:
a = np.sin(phi/degrad) + (2/3) * np.cos(theta/degrad)**2
which will definitely produce the wrong answer!

7.4

More about Python for loops

The for loop in the trigtable program,


for theta in range(0, 90, 1):
mimics the do loop syntax in Fortran. However, the Python for loop is a much more
versatile operator than the Fortran do loop. Here are some examples:
for x in [1, 10, 100, 1000]:

# x will assume these 4 values in the loop

for name in ["Sue", "Dave", "Mary"]:

# name will assume these 3 names

names = ("Sue", "Dave", "Mary")


for name in names:

# example of tuple (see below)


# name will assume these 3 names

7.5. INPUT USING KEYBOARD

7.4.1

241

More about formats

There are many different formats that can specified. Here are some common examples:
%8i
%10.4f
%10.3e
%10.4g
%7s

8 characters
10 character
10 characters
use either f
7 characters

for integer, right justified


field width, 4 to right of decimal
in scientific notation, 3 to right of decimal
or e as appropriate
of string output

Flags can go immediately after the % to customize these:


+
0

left justify (e.g., \%-8i)


include + sign (e.g., \%+8i)
zero fill on left (e.g., \%09i)

ASSIGNMENT P2
Write a Python program to print a table of x, sinh(x), and cosh(x) (the hyperbolic sine and cosine) for values of x ranging in radians from 0.0 to 6.0 at increments
of 0.5. Use a suitable format to make nicely aligned columns of numbers. NOTE:
sinh and cosh are not included in numpy, but are included in the math module.
So you will need to include the line import math and then write math.sinh, etc.,
to get the functions.

7.5

Input using keyboard

So far all of our example programs have run without prompting the user for any information. To expand our abilities, lets learn how to input data from the keyboard.
In most programs, we will want to first prompt the user to input the data, so here
is an example of how to input a number:
a = float(raw_input("Enter number here: "))
raw input is preferred over input for reasons I dont fully understand that have
to do with hacking issues. The prompt Enter number here: will print on the
screen and the user types the number to the right of this and hits return. The
input is ALWAYS a string variable, so we convert it to a real number using the float
function.
We can write a program to multiply two numbers as follows (usermult.py):

242

CHAPTER 7. PETERS PYTHON NOTES

#! /usr/bin/env python
a = float(raw_input("Enter first number: "))
b = float(raw_input("Enter second number: "))
c = a*b
print Product = , c
This is a little clunky because ideally we might want to input the numbers on
the same line. This is surprisingly complicated to do in Python (at least as far as
I could tell, more experienced users can correct me). Here is one way to do this
(usermult2.py):
#! /usr/bin/env python
a, b = [float(x) for x in raw_input("Enter two numbers: ").split()]
c = a*b
print Product = , c
The logic is as follows: The raw input will be a string, such as 5 23 which we
then need to split into two parts. This is done using the split operator by appending
.split() to the string. We now have the two strings 5 and 23 and we use a for
loop to assign x to each in turn and convert to the real variables a and b using the
float function. Note that a and b can be assigned together, i.e., in Python you can
write
a, b = 1, 2
to assign two numbers in one line. This program will work correctly as long as
a blank is used to separate the numbers being input. If the user enters a comma
following the first number, it will crash because if will try to convert something like
2, to a number. That is, the .split operator looks for blanks, not commas.
As a long-time Fortran programmer, you will have a hard time convincing me
that this syntax is easier to learn than:
print *, "Enter two numbers"
read *, a, b
which is how you input two numbers in Fortran. In addition, the Fortran version
will accept a comma between the numbers without giving an error.

7.6

If statements

Next, lets modify this program so that it will allow the user to continue entering
numbers until he/she wants to stop (usermult3.py):

7.6. IF STATEMENTS

243

#! /usr/bin/env python
while (1 < 2):
a, b = [float(x) for x in raw_input \
("Enter two numbers: ").split()]
if (a==0 and b==0): break
c = a*b
print Product = , c
In this case we use a while loop, which continues until the following statement is no
longer true. Since 1 is always less than 2, the loop will continue forever, unless we
explicitly break out of it. But of course we could include a variable in the expression
as in:
x = 1
while (x < 11):
...
...
x = x + 1
in which case x will assume the values from 1 to 10 (easy to also do with a for loop).
The program will allow the user to continuing entering numbers to be multiplied.
When the user wishes to stop the program (in a more elegant way than hitting
[CNTRL] C!), he/she enters zeros for both arguments. The if statement checks
for this and exits the loop in this case:
if (a==0 and b==0): break
The break statement (exit in Fortran) breaks out of the while loop. In this case,
we just execute a single command, but we could also execute a block of code, e.g.
(usermult4.py):
#! /usr/bin/env python
while (1 < 2):
a, b = [float(x) for x in raw_input \
("Enter two numbers (zeros to stop): ").split()]
if (a==0 and b==0):
print You entered two zeros
print so the program will now end
break
c = a*b
print Product = , c
Notice that this block must be indented relative to the if statement line, and that
there is no end if line (just like there is no end do line for the while loop). The
end of the if block is shown simply by the end of the indentation.

244

CHAPTER 7. PETERS PYTHON NOTES


The parentheses in

while (1 < 2):


are optional, i.e.,
while 1 < 2:
will work the same. But I think it looks nicer with them, and you will get less
confused when you switch between computer languages if you always leave them in.
Getting the while loop to go forever by using 1 < 2 is a bit of a kludge. Its
probably better to write
while (True):
which uses the boolean expression True (similarly False is also permitted) . Note
that we could use a variable for this purpose, i.e.,
b = True
while (b)
or even the bizarre
b = 1 < 2
while (b)
A list of relational operators (e.g., used in while and if statements) in different
languages is as follows:

77

FORTRAN
90

.eq.
.ne.
.lt.
.le.
.gt.
.ge.
.and.
.or.

==
/=
<
<=
>
>=
.and.
.or.

MATLAB

PYTHON

==
!=
<
<=
>
>=
&&
||

==
~=
<
<=
>
>=
&
|

==
!=
<
<=
>
>=
and
or

meaning
equals
does not equal
less than
less than or equal to
greater than
greater than or equal to

These operators can be combined to make complex tests, e.g.,


if ( (a > b and c <= 0) or d == 0):

7.7. IF, ELIF, ELSE CONSTRUCTS

245

There is very likely a specific order of operations for these things which I cant
remember very well. Look it up in a book if you are unsure or, better, just put in
enough parenthesis to make it completely clear to anyone reading your code.
One nice aspect of Python compared to C is that if you make a mistake and
type, for example,
if (a = 0):
you will get an error message during compilation. In C this is a valid statement
with a completely different meaning than is intended!

7.7

If, elif, else constructs

A more versatile form of the if statement is as follows:


if (logical expression):
(block of code)
elif (logical expression):
(block of code)
elif (logical expression):
(block of code)
.
.
else:
(block of code)
Note that elif is the Python version of the F90 else if and that the blocks of
code can contain many lines. As many elif statements as required can be used. At
most, one block of code will be executed (once one of the if tests is satisfied, it
does not check the others). The final else will be executed if none of the preceding
if statements is true. The final else is optional.
Here is a demonstration program (usersqrt.py) that repeatedly prompt the user
for a positive real number. If it is negative, ask the user to try again. If it is positive,
it computes and displays the square root using the numpy.sqrt() function. If the
user enters zero, the program stops.
#! /usr/bin/env python
import numpy as np
while (True):
a = float(raw_input("Enter positive real number (0 to stop) "))
if (a < 0):
print This number is negative!

246

CHAPTER 7. PETERS PYTHON NOTES


continue
elif (a == 0):
break
else:
b = np.sqrt(a)
print sqrt = , b

This program is very similar to the F90 version we saw earlier. The continue line
(optional in this case) continues the while loop and serves the same function as
the F90 cycle command. The break line leaves the while loop and thus ends the
program. Recall that exit is the F90 equivalent.
ASSIGNMENT P3
Write a Python program to repeatedly ask the user for the constants a, b, and c
in the quadratic equation a*x**2+b*x+c=0. Using the quadratic formula, have the
program identify and compute any real roots. Output the number of real roots and
their values. Stop the program if the users enters zeros for all three values. HINTS:
If you have trouble getting Python to read in all three numbers on one line, feel free
to enter them on three separate lines. Test your program for some simple examples
to make sure it is working correctly (a=1, b=2, c=-3 should return -3 and 1).

7.8

Greatest common factor example

Here is a program that illustrates some of the concepts that we just learned:
#! /usr/bin/env python
# compute the greatest common factor of two integers
while (True):
a, b = [int(x) for x in raw_input \
("Enter two numbers (zeros to stop): ").split()]
if (a == 0 and b == 0):
break
else:
for i in range(1, min(a, b) + 1):
if (a % i == 0 and b % i == 0):
imax = i
print Greatest common factor = , imax
The structure is similar to the corresponding F90 gcf program, but it has fewer
lines due to the lack of enddo and endif statements. Note that a % i computes the
modulus (remainder) of a/i. Note that a and b must be integers, so we changed the
user input line to int(x) from the float(x) that we used for the earlier programs.
Finally, note that we add one to min(a,b) because the Python upper limit is one

7.9. USER DEFINED FUNCTIONS

247

less than the number in the range argument (this is hard to remember for F90
programers!).
ASSIGNMENT P4
Modify gcf.py to compute the least common multiple of two integers.

7.9

User defined functions

(To motivate this, lets repeat the F90 notes here:) As the length and complexity
of a computer program grows, it is a good strategy to break the problem down into
smaller pieces by defining functions or subroutines to perform smaller tasks. This
provides several advantages:
1. You can test these pieces individually to see if they work before trying to get
the complete program to work.
2. Your code is more modular and easier to understand.
3. It is easier to use parts of the program in a different program.
To illustrate how to define your own function in Python, here again is the greatest
common factor program:
#! /usr/bin/env python
# use a function to compute the greatest common factor of two integers
def GETGCF(a, b):
for i in range(1, min(a, b) + 1):
if (a % i == 0 and b % i == 0):
imax = i
return imax
while (True):
a, b = [int(x) for x in raw_input \
("Enter two numbers (zeros to stop): ").split()]
if (a == 0 and b == 0):
break
else:
imax = GETGCF(a, b)
print Greatest common factor = , imax
Because Python is not compiled (it just interprets your script line by line), you
define your functions BEFORE the main program. This is opposite to the convention
in Fortran, where functions and subroutines normally follow the main program. To

248

CHAPTER 7. PETERS PYTHON NOTES

define a function, you must start with def and then the name of the function and
any arguments:
def GETGCF(a, b):
Notice the syntax: a and b are passed into the function from the calling program.
imax is returned to the calling program with the return statement. The contents of
GETGCF are indented; we know the function ends when the indentation ends. Of
course, we dont really need the imax in the main programwe could have simply
written:
else:
print Greatest common factor = , GETGCF(a, b)
In the F90 notes, we introduce subroutines at this point. Im not sure if there
is an exact equivalent in Python, that is, something you call with variables in an
argument list that can be used for both input and output from the subroutine.
However, Python functions are more versatile than Fortran functions because they
can return more than one value. Thus, we will do the SPH AZI example using a
function in userdist.py:
#! /usr/bin/env python
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#

SPH_AZI computes distance and azimuth between two points on sphere


Inputs:

flat1
flon2
flat2
flon2
Returns: del
azi

=
=
=
=
=
=

latitude of first point (degrees)


longitude of first point (degrees)
latitude of second point (degrees)
longitude of second point (degrees)
angular separation between points (degrees)
azimuth at 1st point to 2nd point, from N (deg.)

Notes:
(1) applies to geocentric not geographic lat,lon on Earth
(2) This routine is inaccurate for del less than about 0.5 degrees.
For greater accuracy, use double precision or perform a separate
calculation for close ranges using Cartesian geometry.

def SPH_AZI(flat1, flon1, flat2, flon2):


import numpy as np
if ( (flat1 == flat2 and flon1 == flon2) or
(flat1 == 90. and flat2 == 90.) or

\
\

7.9. USER DEFINED FUNCTIONS

249

(flat1 == -90. and flat2 == -90.) ):


delta = 0.
azi = 0.
return delta, azi
raddeg = np.pi/180.
theta1 = (90. - flat1)*raddeg
theta2 = (90. - flat2)*raddeg
phi1 = flon1*raddeg
phi2 = flon2*raddeg
stheta1 = np.sin(theta1)
stheta2 = np.sin(theta2)
ctheta1 = np.cos(theta1)
ctheta2 = np.cos(theta2)
cang = stheta1*stheta2*np.cos(phi2 - phi1)+ctheta1*ctheta2
ang = np.arccos(cang)
delta = ang/raddeg
sang = np.sqrt(1. - cang*cang)
caz = (ctheta2 - ctheta1*cang)/(sang*stheta1)
saz = -stheta2*np.sin(phi1 - phi2)/sang
az = np.arctan2(saz, caz)
azi = az/raddeg
if (azi < 0.): azi = azi + 360.
return delta, azi
while (True):
lat1, lon1 = [float(x) for x in raw_input \
("Enter first point lat, lon: ").split()]
lat2, lon2 = [float(x) for x in raw_input \
("Enter second point lat, lon: ").split()]
delta, azi = SPH_AZI(lat1, lon1, lat2, lon2)
print delta, azi = , delta, azi
Notice that delta and azi are not part of the argument list, but both are returned
from SPH AZI when we write:
delta, azi = SPH_AZI(lat1, lon1, lat2, lon2)

7.9.1

Python keywords

When I first tried translating this program from the F90 version, I had trouble
because it turns out that del is a special word in Python and is used to delete
items from lists (we have not talked about this yet). So I had to change the variable
name del to delta to get it to work. If you use Xcode to do your editing, then the
key words show up in different colors, so you have a clue that you should not use
them for variable names. Here are the keywords in basic Python:

250

CHAPTER 7. PETERS PYTHON NOTES

and
as
assert
break
class
continue
def

del
elif
else
except
exec
finally
for

from
global
if
import
in
is
lambda

not
or
pass
print
raise
return
try

while
with
yield

DO NOT USE THESE AS VARIABLE NAMES! Most of these words kind of sound
like things that make sense at some level, although we arent going to discuss them
until we need to. But the one the really bugs me is lambda as one would like to
think that a Greek letter would always be OK to use in mathematical expressions1 .
ANOTHER WARNING: Another potential pitfall is using certain special names
for your Python script. I ran into this recently when I called one of my programs
string.py. The program worked fine, but created a module that prevented numpy
from working in my other Python scripts! I believe the problem is that there is a
module called string but am not sure about this. Note that string is not one of
the Python keywords. It would be good to find a complete list of program names
to avoid in Python, but I have not found this yet. Can any of you find such a list
or a way to generate it?

7.9.2

Using functions in separate files

If we want to use SPH AZI in multiple programs, its annoying and inefficient to
have to include it in each script. To avoid this, we can simply include the function
in a separate script, named, for example, sph subs.py:
# SPH_AZI computes distance and azimuth between two points on sphere
#
.
.
.
az = np.arctan2(saz, caz)
azi = az/raddeg
if (azi < 0.): azi = azi + 360.
return delta, azi
and then our userdist2.py script can have the form:
#! /usr/bin/env python
1

Incidentally, it is not easy to understand what lambda actually does, at least for a Python
newbie like me. One website says that it is best described as a tool for building callback handlers.
Thanks, that really clears things up!

7.10. ARRAYS

251

import sph_subs as ss
while (True):
lat1, lon1 = [float(x) for x in raw_input \
("Enter first point lat, lon: ").split()]
lat2, lon2 = [float(x) for x in raw_input \
("Enter second point lat, lon: ").split()]
delta, azi = ss.SPH_AZI(lat1, lon1, lat2, lon2)
print delta, azi = , delta, azi
Note that we drop the .py suffix when we import sph subs and that we choose ss
as the prefix to use before the function names (analogous to using np as the prefix
for the numpy functions.
The sph subs.py script must be in the same directory as userdist2.py. When you
run the userdist2.py script, a file called sph subs.pyc, is automatically created. This
is a binary file (dont try to edit it), which helps future scripts to load faster.

7.10

Arrays

Here are some examples of how to define arrays in Python using numpy:
#! /usr/bin/env python
import numpy as np
a = np.ndarray(shape=(100), dtype=float)
ii = np.ndarray(shape=(50,2), dtype=int)
In this case, a is defined as a vector (1-D array) with 100 real elements and ii
is defined as a 50 x 2 matrix of integers. Annoyingly, like C, array indices start
with zero, not one. Thus, in the above example the a array has values from a[0]
to a[99], and the b array includes b[0,0] but not b[50,2]. Notice that in Python,
unlike Fortran, we use brackets, not parenthesis, to refer to specific array values.
Here is an example program that uses an array to compute prime numbers less
than 100:
#! /usr/bin/env python
import numpy as np
maxnum = 100
prod = np.ndarray(shape=(maxnum+1), dtype=int)
for i in range(1, maxnum+1):
prod[i] = 0
max_i = int(np.sqrt(maxnum))

252

CHAPTER 7. PETERS PYTHON NOTES

for i in range(2, max_i+1):


if (prod[i] == 0):
max_j = maxnum/i
for j in range(2, max_j+1):
prod[i*j] = 1
nprime = 0
for i in range(2, maxnum+1):
if (prod[i] == 0):
nprime = nprime + 1
print i
print Number of primes found = , nprime
Note that we set the upper limit of the prod array to maxnum+1, so that the actual
array goes from prod[0] to prod[maxnum] (recall that array indices start with zero).
We dont use prod[0] for anything. Actually, we also dont use prod[1] for anything,
just like in the F90 version.
Now lets change the code to print out many numbers per line by saving the
prime numbers in a separate array called pnum. Here is the code:
#! /usr/bin/env python
import numpy as np
maxnum = 1000
prod = np.ndarray(shape=(maxnum+1), dtype=int)
pnum = prod
for i in range(1, maxnum+1):
prod[i] = 0
max_i = int(np.sqrt(maxnum))
for i in range(2, max_i+1):
if (prod[i] == 0):
max_j = maxnum/i
for j in range(2, max_j+1):
prod[i*j] = 1
nprime = 0
for i in range(2, maxnum+1):
if (prod[i] == 0):
nprime = nprime + 1
pnum[nprime] = i
print Number of primes found = , nprime
print pnum[1:nprime+1]

7.11. CHARACTER STRINGS

253

Notice that we can define the second array, pnum, the same size as the first array
by simply saying
pnum = prod
The output of this code is:
Number of primes
[ 2
3
5
7
67 71 73 79
157 163 167 173
257 263 269 271
367 373 379 383
467 479 487 491
599 601 607 613
709 719 727 733
829 839 853 857
967 971 977 983

found = 168
11 13 17 19
83 89 97 101
179 181 191 193
277 281 283 293
389 397 401 409
499 503 509 521
617 619 631 641
739 743 751 757
859 863 877 881
991 997]

23
103
197
307
419
523
643
761
883

29
107
199
311
421
541
647
769
887

31
109
211
313
431
547
653
773
907

37
113
223
317
433
557
659
787
911

41
127
227
331
439
563
661
797
919

43
131
229
337
443
569
673
809
929

47
137
233
347
449
571
677
811
937

53
139
239
349
457
577
683
821
941

59
149
241
353
461
587
691
823
947

61
151
251
359
463
593
701
827
953

Notice that, like in F90, we can specify a range of array indices using pnum[1:nprime+1]
and how Python adds an opening and closing bracket around the array contents
upon output. Why is the upper limit nprime+1 and not nprime? Because of the
non-intuitive way Python sets the upper limits. Recall that we ran into this before
with range and shape in (np.ndarray). Its confusing because of the difference
from the F90 convention.
The output looks pretty nice, but for completeness, at some point it would be
nice to know how to make an explicitly formatted output table of the primes like we
did in the F90 notes. I was not able to easily find an example of this by Googling
around.

7.11

Character strings

You can define a string variable in Python very simply:


s1 = Peter
and combine strings like this:
s1 = Peter
s2 = Shearer
s3 = s1 + + s2
There are lots of built in functions to manipulate strings in Python. Here are some
examples:

254

CHAPTER 7. PETERS PYTHON NOTES

#! /usr/bin/env python
s1 = Peter Shearer
print s1[0:5]
print len(s1)
print s1.find(Sh)
print s1.split()
name1, name2 = s1.split()
print name1
print name2
and its output:
Peter
13
6
[Peter, Shearer]
Peter
Shearer
Here is the voter program from the F90 notes, translated into Python:
#! /usr/bin/env python
name = raw_input("Who did you vote for in 1992?

")

if (name.find(ill) != -1 or name.find(lin) != -1):


print Then you are likely a Democrat.
elif (name.find(eor) != -1 or name.find(ush) != -1):
print Then you are likely a Republican.
else:
print Then you are likely an independent voter.

7.12

I/O with files

Here is the Python version of fileinout.f90, a program that reads pairs of numbers
from an input file, computes their product, and outputs the original numbers and
their product to an output file:
#! /usr/bin/env python
infile = raw_input("Enter input file name: ")
filein = open(infile, r)

7.12. I/O WITH FILES

255

outfile = raw_input("Enter output file name: ")


fileout = open(outfile, w)
while (True):
line = filein.readline()
if not line: break
x, y = [float(a) for a in line.split()]
z = x * y
line2 = "%10.3f %10.3f %10.3f \n" % (x, y, z)
fileout.write(line2)
filein.close
fileout.close
Lets examine how it works. First we read and open the input file:
infile = raw_input("Enter input file name: ")
filein = open(infile, r)
The first line here is similar to what we used before to input numbers except this
time all we need is the string returned by raw input. Next we open file file and assign
it to object infile (this is like a unit number in F90). The optional 2nd argument
r opens the file for read only. This is good to specify because it will prevent our
accidentally writing on top of it. It also means the file must already exist. Similarly
we read in and open the output file:
outfile = raw_input("Enter output file name: ")
fileout = open(outfile, w)
This is almost identical to the input version, except we specify w for write.
Next, we use a while loop to read the input file one line at a time:
while (True):
line = filein.readline()
if not line: break
We use the readline method to read line (a string) from the object filein. If there is
nothing more to read, then not line will be true and we exit the while loop. Next
we split line into two parts (assumes the numbers are separated with blanks and no
commas) and then converts to two real numbers (we saw this before when we input
two numbers on one line), and then computes their product:
x, y = [float(a) for a in line.split()]
z = x * y

256

CHAPTER 7. PETERS PYTHON NOTES


Next we perform a formatted write of the three numbers to line2 (another string

variable) and then write this to object line2:


line2 = "%10.3f %10.3f %10.3f \n" % (x, y, z)
fileout.write(line2)
When we are done (having left the while loop with the break), we close the two
files:
filein.close
fileout.close
There are typically many ways to write the same program. This is even more
true in Python than it was in Fortran. Here is a more compact version of fileinout.py:
#! /usr/bin/env python
filein = open(raw_input("Enter input file name: "), r)
fileout = open(raw_input("Enter output file name: "), w)
for line in filein:
x, y = [float(a) for a in line.split()]
z = x * y
fileout.write("%10.3f %10.3f %10.3f \n" % (x, y, z))
This should be self-explanatory, except for the line
for line in filein:
which is a new syntax and shows how Python will automatically grab one line at a
time from the input file object.
Alternatively, we could read the entire input file in at once:
#! /usr/bin/env python
filein = open(raw_input("Enter input file name: "), r)
fileout = open(raw_input("Enter output file name: "), w)
lines = filein.readlines()
for line in lines:
x, y = [float(a) for a in line.split()]
z = x * y
fileout.write("%10.3f %10.3f %10.3f \n" % (x, y, z))
filein.close
fileout.close

7.13. USING TUPLES AND LISTS

257

where we use readlines rather than readline in order to input the entire file. Then
notice how we can pull out one line at a time by writing:
for line in lines:
There is probably even a way to do this where we dont loop over the lines, but
somehow directly read x and y as vectors, compute z as a vector, and then output
the vectors directly to the output file. Any Python experts want to try to find this
approach?

7.13

Using tuples and lists

This is a placeholder. For now, look at Lisas notes.

7.14

Plotting with Python

A big advantage of Python over Fortran is the ability to make plots directly from
Python scripts. There are a variety of plot packages available for Python. We are
going to use matplotlib because this is what Lisa uses. Please consult her notes for
more details.
Lets start with an example program (xyplot.py):
#! /usr/bin/env python
import matplotlib
matplotlib.use("TkAgg")
import pylab
x = [1., 2., 3., 4.]
y = [1., 1.9, 3.2, 4.]
pylab.plot(x, y)
pylab.show()
which produces Figure 7.1. The first three lines are necessary to import the plotting
packages and define some of their attributes. Next, we define two lists of 4 numbers
each:
x = [1., 2., 3., 4.]
y = [1., 1.9, 3.2, 4.]
These are not the same as the arrays that we used earlier. In fact they could be
a mixture of different types of variables (e.g., int, float, string), but in this example

258

CHAPTER 7. PETERS PYTHON NOTES

Figure 7.1: The pop-up window that results from xyplot.py.


they need to be of the same type and size. Next, we plot the points and display the
result:
pylab.plot(x, y)
pylab.show()
which brings up the pop-up window. From the window, you can save the plot by
clicking on the little microdisk icon. You can save it in different format (e.g., .eps,
.png) by using the appropriate suffix on the file name. You must close the window
(click on the ref button in the far upper left) to get back to your terminal window.
Now lets improve the appearance of this plot and add some labels (xyplot2.py):
#! /usr/bin/env python
import matplotlib
matplotlib.use("TkAgg")
import pylab
x = [1., 2., 3., 4.]
y = [1., 1.9, 3.2, 4.]
pylab.xlim(0.5, 4.5)
pylab.ylim(0.5, 4.5)

7.14. PLOTTING WITH PYTHON

259

pylab.plot(x, y, bs)
pylab.xlabel(X axis)
pylab.ylabel(Y axis)
pylab.title(My x-y points)
pylab.show()

Figure 7.2: The pop-up window that results from xyplot2.py.


which produces Figure 7.2. In this case we define the plot limits so that the endpoints
do not sit on the plot axes:
pylab.xlim(0.5, 4.5)
pylab.ylim(0.5, 4.5)
and we plot the points with blue squares:
pylab.plot(x, y, bs)
where the optional pylab.plot 3rd argument can indicate various colors, symbols,
and line types (r=red, b=blue, g=green, c=cyan, m=magenta, y=yellow, k=black,
w=white; +=plus, .=dot, o=circle, *=star, p=pentagon, s=square, x=x, D=diamond,
h=hexagon, carat=triangle; -=solid line, =dashed line, :=dotted line, -.=dashdotted line, None=no connecting lines).

260

CHAPTER 7. PETERS PYTHON NOTES

7.14.1

Example of computing least-squares line

As a example to illustrate both Python matrix math and plotting capabilities, lets
compute and plot the best-fitting line to the points from the last section, i.e., (1, 1),
(2, 1.9), (3, 3.2), and (4, 4). Let the equation of the line be y = a + bx, where a is
the y-intercept and b is the slope. Then we can express the desired fit to the data
as the matrix equation:
1
1
1.9 1

3.2 = 1
1
4

1
2
a
3 b
4

(7.1)

where the goal is to adjust a and b so that the r.h.s. best fits the l.h.s. Note that the
2x4 matrix in this case consists of a column vector with ones (that are multiplied
by a) and a column vector containing the x values of the data points (which are
multiplied by b, the slope).
This is a standard form for linear problems: d = Gm where d is a data vector
with the y-values of the data, G is the linear operator for predicting the data
from the model, and m is the desired model. In this case the model is simply the
coefficients a and b of the line.
Our program will need to import numpy to do basic matrix arithmetic. To do
more advanced matrix operations, such as computing inverses and eigenvalues, we
also import the linear algebra package numpy.linalg
(see, e.g., http://docs.scipy.org/doc/numpy/reference/routines.linalg.html):
import numpy as np
import numpy.linalg as lin
In the last section, we defined the x and y vectors as lists, but we now need to
make them matrices, using the numpy mat method:
x = np.mat([1, 2, 3, 4])
y = np.mat([1.0, 1.9, 3.2, 4.0])
Similarly, we set up the 2x4 G matrix:
G = np.mat([[1,1], [1,2], [1, 3], [1, 4]])
and the 1x2 column vector d, given by yT
d = y.T

7.14. PLOTTING WITH PYTHON

261

where .T is the transpose operator. In our case, m will also be a 2-vector that
contains the y-intercept and slope of the best fitting line. To solve for m, we apply
the standard formula for the least-squares solution2 to d = Gm.
m = GT G

GT d

(7.2)

Translated to Python, this becomes


m = lin.inv(G.T * G) * G.T * d
print m = \n, m
where lin.inv is the linalg matrix inverse function. The print statement is to show
the coefficients a and b in the best-fitting line, i.e.,
m =
[[-0.05]
[ 1.03]]
The best-fitting line is of the form y = 0.05 + 1.03 x. To plot this line, we
compute the y values for x values of 0.8 and 4.2:
x2 = np.mat([0.8, 4.2])
d2 = np.mat([[1, 0.8], [1, 4.2]]) * m
Putting it all together, here is the complete program:
#! /usr/bin/env python
import matplotlib
matplotlib.use("TkAgg")
import pylab
import numpy as np
import numpy.linalg as lin

#matrix ops, such as inverse and determinant

x = np.mat([1, 2, 3, 4])
y = np.mat([1.0, 1.9, 3.2, 4.0])
G = np.mat([[1,1], [1,2], [1, 3], [1, 4]])
d = y.T

#data vector

m = lin.inv(G.T * G) * G.T * d
print m = \n, m

#model vector for least-squares fit

x2 = np.mat([0.8, 4.2])
2

Note to Geophysics Graduate students: The formula for m given here is the sort of thing that
you are expected to know how to derive before you take your departmental exam. See a linear
algebra or statistics text for details

262

CHAPTER 7. PETERS PYTHON NOTES

d2 = np.mat([[1, 0.8], [1, 4.2]]) * m


pylab.xlim(0.5, 4.5)
pylab.ylim(0.5, 4.5)
pylab.plot(x, y, sb)
pylab.plot(x2.T, d2, r-)

#(x2, d2.T) fails for lines, but works for symbols (bug?)

pylab.xlabel(X axis)
pylab.ylabel(Y axis)
pylab.title(My best-fitting line)
pylab.show()
which produces the plot shown in Figure 7.3. I had trouble getting the red line to plot
using the matrix input to pylab.plot. For some weird reason (bug?), pylab.plot(x2,
d2, sr) works, but pylab.plot(x2, d2, r-) fails. In addition, pylab.plot(x2, d2.T,
sr) works and pylab.plot(x2, d2.T, r-) fails. The only way I succeeded in plotting
the line is pylab.plot(x2.T, d2, r-). This makes no sense to me!

Figure 7.3: The pop-up window that results from xyplot3.py.


Note that this problem can also be solved using the numpy.linalg least-squares

7.14. PLOTTING WITH PYTHON


solution function lstsq:
m, residues, rank, s = lin.lstsq(G, d)
print m = \n, m

263

264

CHAPTER 7. PETERS PYTHON NOTES

Chapter 8

LaTeX
Someday you will want to publish the results of your research. IGPP scientists
typically use either Word of Latex for this purpose (some old timers still use troff, a
UNIX precursor to LaTeX). Most of you probably already know how to use Word.
It is very easy to use (WYSIWYG) and is the standard word processor of choice
in many institutions. It is not, however, ideal for writing scientific articles. It has
particular difficulty handling equations and specialized typesetting requirements. A
far better choice is TeX or LaTeX.
Advantages of LaTeX:
1. It is in the public domain so you are not supporting the Evil Empire.
2. You can run it on Macs, PCs or UNIX machines.
3. By using macros, it is a far more powerful and versatile system than all that
pull down menu crap you find in most commercial word processors.
4. AGU uses LaTeX for its electronic submission of articles and abstracts. They
provide macros to match the style of their journals so that preparing cameraready copy is trivial.
5. You can produce beautiful looking equations much more easily than with programs like Word.
LaTeX takes a little while to learn at first, but you will find it well worth your
effort. I will describe running LaTeX from the UNIX command line although I
am actually most familiar with using it on my Mac using a fancy implementation
(TexShop), which you can download from:
265

266

CHAPTER 8. LATEX
http://pages.uoregon.edu/koch/texshop/

8.1

A simple example

Use your favorite text editor to create the file samp1.tex (an example from the Latex
book, Kopka and Daly, A Guide to Latex2e):
\documentclass{article}
\begin{document}
Today (\today) the rate of exchange between the British pound
and the American dollar is \pounds 1 = \$1.63, an increase of
1\% over yesterday.
\end{document}
which will produce:
Today (November 30, 2012) the rate of exchange between the British pound and
the American dollar is 1 = $1.63, an increase of 1% over yesterday.
The document must first define a document class. Options are book, report, article, or letter. The guts of the document are then enclosed between the \begin{document}
and \end{document} commands.
This example highlights some of the special characters and macros in Latex. The
following characters are normally interpreted as Latex commands:
$

&

To print these out in your text, you must precede them with a backslash, i.e.,
\$

\&

\%

\#

\_

\{

\}

In this example, \today invokes a macro to print todays date and \pound
will print the British pound symbol. From this example you can see that Latex is
not WYSIWYG! To typeset this document, enter:
latex samp1
Assuming you have no errors in the input file, this will generate the file samp1.dvi.
To preview this file, enter:
xdvi samp1

8.2. EXAMPLE WITH EQUATIONS

267

To convert the file to a Postscript file, enter:


dvips samp1 -o samp1.ps
Notice that Latex automatically indented the first line of the paragraph. To
avoid this, use the command for the target paragraph:
\documentclass{article}
\begin{document}
\noindent
Today (\today) the rate of exchange between the British pound
and the American dollar is \pounds 1 = \$1.63, an increase of
1\% over yesterday.
\end{document}

To globally change the paragraph indenting, you can change the default for this
directly:
\documentclass{article}
\begin{document}
\setlength{\parindent}{0.5in}
Today (\today) the rate of exchange between the British pound
and the American dollar is \pounds 1 = \$1.63, an increase of
1\% over yesterday.
\end{document}

This will now indent all paragraphs by 0.5 inches. To remove paragraph indenting, just set this parameter to zero. Latex ignores the carriage returns at the ends
of each line in the input file. It also ignores extra blanks; it only considers the first
blank. New paragraphs are defined by adding a blank line between blocks of text.

8.2

Example with equations

\documentclass{article}
\begin{document}
\setlength{\parindent}{0.0in}
The function $X(p)$ is more nicely behaved than $T(X)$ since it does not
cross itself (there is a single value of $X$ for each value of $p$), but
the inverse function $p(x)$ is multi valued. An even nicer function is
the combination
\begin{equation}

268

CHAPTER 8. LATEX

\tau(p) = T(p) - pX(p),


\end{equation}
where $\tau$ is called the {\it delay time}. It can be calculated very
simply:
\begin{equation}
\tau(p) = 2 \int_0^{z_p} \left[ {u^2 \over {(u^2-p^2)^{1/2}}} {p^2 \over {(u^2-p^2)^{1/2}}} \right] \, dz
\end{equation}
or
\begin{eqnarray}
\tau(p) &=& 2 \int_0^{z_p} (u^2-p^2)^{1/2} \, dz \\
&=& 2 \int_0^{z_p} \eta(z) \, dz
\end{eqnarray}
where $\eta$ is the vertical slowness.
\end{document}

which typesets as:


The function X(p) is more nicely behaved than T (X) since it does not cross itself
(there is a single value of X for each value of p), but the inverse function p(x) is
multi valued. An even nicer function is the combination
(p) = T (p) pX(p),

(8.1)

where is called the delay time. It can be calculated very simply:


zp

(p) = 2
0

p2
u2

(u2 p2 )1/2 (u2 p2 )1/2

dz

(8.2)

or
zp

(p) = 2

(u2 p2 )1/2 dz

(8.3)

(z) dz

(8.4)

0
zp

= 2
0

where is the vertical slowness.


Note that equations within the text are enclosed with $ signs. \begin{equation}
and \end{equation} are used to put the equation on a separate line. Variables
within the equations are automatically put into italics. Greek letters are defined
as \tau, \eta, etc. Equations are automatically numbered (this can be changed if
desired). Subscripts are defined with the underscore ( ), superscripts with the carat.
Fractions are written as, for example, {x \over y}. \int is for the integral symbol;
note how the limits are written. Curly brackets are used to separate thingsthey
do not appear in the typeset version.

8.3. CHANGING THE DEFAULT PARAMETERS

269

The \begin{eqnarray} section is used to align the = signs in the two lines of
equations. Note that &=& is used to define what it is that is being aligned. Every
line except the last line has a carriage return (\\). Although TeX generally does
an excellent job of spacing equations, sometimes some fine tuning will help. In this
case \, is used to place a tiny amount of space before the dz.

8.3
8.3.1

Changing the default parameters


Font size

There is a standard font size declared for each document class. In most cases this
is Roman 10 pt. One easy way to change the size is with the following commands,
arranged in increasing order of size:
\tiny
\scriptsize
\footnotesize
\small
\normalsize
\large
\Large
\LARGE
\huge
\Huge

8.3.2

Font attributes

\itshape
\slshape
\upshape
\bfseries
\mdseries

=
=
=

switch to italics
switch to slanted font
switch to upright (normal) font
=
=

switch to bold
switch (back) to regular weight

These commands can be invoked in several different ways to put in situ in italics
and then return to normal font:
\itshape in situ \upshape
OR
{\itshape in situ}

270

CHAPTER 8. LATEX

OR
\begin{itshape}
in situ
\end{itshape}
These different options also apply to the font sizes listed above. Here is an
example of how these options can be used:
\documentclass{article}
\begin{document}
Lets test different ways to say \itshape in situ \upshape
in italics. There are several different ways to say
{\itshape in situ} in italics; indeed
\begin{itshape}
in situ
\end{itshape}
can be written in italics in at least three different ways.
Next lets experiment with {\large large}, {\Large Large},
and {\LARGE LARGE} type, as well as {\small small},
{\footnotesize footnotesize}, and {\scriptsize scriptsize}
type.
\end{document}
which produces:
Lets test different ways to say in situ in italics. There are several different ways
to say in situ in italics; indeed in situ can be written in italics in at least three
different ways.
Next lets experiment with large, Large, and LARGE type, as well as small,
footnotesize,

8.3.3

and

scriptsize

type.

Line spacing

There is a default line spacing created whenever a new font is selected. Here are
examples of two different ways that this can be changed:
\setlength{\baselineskip}{10pt}

Change line spacing to 10 point


(single spacing for 10pt font).

\renewcommand{\baselinestretch}{2}

Change normal (single) line spacing


by a factor of 2. This will result
in double spacing. This must be
called before \begindocument or before
a new font declaration.

8.4. INCLUDING GRAPHICS

271

Here is an example:
\documentclass{article}
\begin{document}
\setlength{\baselineskip}{20pt}
Today (\today) the rate of exchange between the British pound
and the American dollar is \pounds 1 = \$1.63, an increase of
1\% over yesterday. Lets write one more line here so that we
get up to three lines and can see the spacing better.
\end{document}
which produces:
Today (November 30, 2012) the rate of exchange between the British pound and
the American dollar is 1 = $1.63, an increase of 1% over yesterday. Lets write
one more line here so that we get up to three lines and can see the spacing better.

8.4

Including graphics

Postscript, EPS, and PDF files can easily be embedded as figures in a Latex document by including Latex extension packages such as graphics or graphicx. Here is
an example that uses graphicx:
\documentclass{article}
\usepackage{graphicx}
\begin{document}
\setlength{\baselineskip}{20pt}
We are now going to show how to embed a Postscript file into
a LaTex document using the includegraphics command.
\begin{figure}[h]
\begin{center}
\includegraphics[scale=0.7]{plot_1.pdf}
\end{center}
\caption{Here is the caption for this plot.
positioned below the plot.}
\end{figure}

This will be automatically

And then here is some more text to show where the next block of text
will appear. Blah, blah, blah...
\end{document}
which will produce:
We are now going to show how to embed a Postscript file into a LaTex document
using the includegraphics command.

272

CHAPTER 8. LATEX

Figure 8.1: Here is the caption for this plot. This will be automatically positioned
below the plot.
And then here is some more text to show where the next block of text will
appear. Blah, blah, blah...
Note that the graphicx package is loaded with the \usepackage{graphicx} command at the start of the file. The \begin{figure} macro has various options for
where the figure will be positioned, including at the present location in the text [h],
or at the top [t] or bottom of the page [b]. These options can be combined, i.e., [tb]
will position the figure at either then top or bottom of the page. In this example,
the figure is scaled to 70% of its original size. Depending upon how the figure is
positioned in the PDF file, you also may need to apply a bounding box to remove
the surrounding white space using the bb= option, e.g.,
\includegraphics[bb=2in 4in 7in 9in, scale=0.7]{plot_1.pdf}
which specifies exactly what part of the page will be windowed and displayed. This
is necessary for PDF figures that appear in only part of an entire page and it can be
tedious to find the right bounding box. In my experience, an easier option is to use
the Mac Preview program to open Postscript or EPS files (from Adobe Illustrator

8.5. WANT TO KNOW MORE?

273

or other graphics programs) and then save them within Preview as PDF files. In
this case, they are tightly windowed and the bb option is not needed.
Figures are automatically numbered; users have control over the starting figure
number.

8.5

Want to know more?

There is a huge amount of material about Latex on the web. Check out David
McMillans great Latex example file, ex.tex, which you can find in shearer/CLASS/COMP/LATEX.
The accompanying files psfig.tex, hobbes.ps, and gnufig.tex are also included.
A good Latex reference book is:
Kopka, H., and P.W. Daly, A Guide to Latex 2e, Addison-Wesley, New York,
1995.

274

CHAPTER 8. LATEX

Chapter 9

Postscript plotting
PostScript is a page description language that is used by most modern printers
for both text and graphics. You can generate Postscript files automatically from
programs. For example on the Macs, you can print to a file rather than to the
printer and it will save whatever you were going to print out as a Postscript file
(most people save as a PDF file, but there is still an option to save in Postscript).
However, you can also generate your own Postscript files and that is the subject
of these notes. Postscript is ideally suited for user generated graphics because it is
in ascii (plain text), is well documented, and is generally easy to understand.
Lets start with an example Postscript file (mypost1):
newpath
144 72 moveto
288 432 lineto
stroke
showpage
We start with the NEWPATH operator. This empties the current path and
declares we are starting a new path (because we are at the beginning of the file, this
command is not actually necessary here but its a good idea to start with this).
Next we move to the point (144, 72):
144 72 moveto
The default coordinate system for Postscript files is in units of 1/72 of an inch,
measured from the lower left corner of the page. The arguments for moveto are the
x and y coordinates. These numbers move onto a stack and are then read by the
moveto command. Thus, this command moves us to a point 2 inches to the right of
275

276

CHAPTER 9. POSTSCRIPT PLOTTING

the left page edge and 1 inch above the page bottom. This is the current point
and we can think of this as a pen location.
Next we add a line segment between the current point and a new point at (288
432) (4 inches, 6 inches):
288 432 lineto
You can think of this as a draw command except the draw is not actually
executed yetit is just added to the currently defined path. To draw the path on
the page, we use the stroke command:
stroke
Finally, we display the current page:
showpage
We can preview this file with pageview or ghostview. We can also send it to
a Postscript printer. However, to be sure that the printer recognizes that it is a
Postscript file, we should always start the file with %! PostScript file, i.e. (mypost2),
%! PostScript file
newpath
144 72 moveto
288 432 lineto
stroke
showpage
If you dont start with this, then the printer will just print out the ascii text.
This is no big deal with a small file like this but is a big problem for a large Postscript
file. You dont want 200 pages of text to come out of the printer when you really
wanted a single page of graphics! If you ever do make this mistake, you will need to
cancel the print job on the printer and/or kill the job on your computer.
If you are like me, you will find the 1/72 inch coordinate system an annoyance.
Both Bob Parker and I prefer to use 1/1000 inch coordinates. This can by adding
the following line at the beginning of your file:
0.072 0.072 scale

% Coords are 1/1000 th inch

277
This means that objects will be drawn 0.072 times as large as they would have
been drawn with the old coordinate system. The coordinate 1000 thus now indicates
1000*(1/72)*0.072 = 1 inch.
Note that % is used to add comments. Anything to the right of the % is assumed
to be a comment (the first line is a special comment that the printer recognizes).
Why use 1/1000 inch as the coordinate instead of inches directly? The nice thing
about 1/1000 inch is that this is roughly the limit of what the human eye can resolve
on the page so it is not likely that one will need to divide things smaller than this.
Thus, one can avoid decimal places in the numbers and can save a little bit of space
in the file.
We can save more space if we shorten the moveto and lineto commands
since they will be used many times in a complicated file. We can shorten then by
defining our own shorter operators:
/m {moveto} def
/d {lineto} def
This tells the Postscipt interpreter to replace m with moveto and d with
lineto. Our revised Postscript file thus becomes (mypost3):
%! PostScript file
/m {moveto} def
/d {lineto} def
0.072 0.072 scale

% Coords are 1/1000 th inch

newpath
2000 1000 m
4000 6000 d
stroke
showpage
Now lets draw a box with thicker lines than our starting examples (mypost4):
%! PostScript file
/m {moveto} def
/d {lineto} def
0.072 0.072 scale
newpath
4000 4000
4000 5000
5000 5000
5000 4000
4000 4000

m
d
d
d
d

% Coords are 1/1000 th inch

278

CHAPTER 9. POSTSCRIPT PLOTTING

30 setlinewidth
stroke
showpage
We draw the four sides with four d (lineto) commands. Before actually drawing
the box, we set the line width to 30/1000 inch. The resulting plot will have a slight
notch in the starting corner. To avoid this, we can use the closepath command
(mypost5):
%! PostScript file
/m {moveto} def
/d {lineto} def
0.072 0.072 scale

% Coords are 1/1000 th inch

newpath
4000 4000 m
4000 5000 d
5000 5000 d
5000 4000 d
closepath
30 setlinewidth
stroke
CLOSEPATH adds a line segment from the current point to the initial point in
the path, thus we dont need the fourth LINETO command. It also closes the path
with a mitered join so we dont have the notch as in the previous example.
Once a closed path is defined, we can fill in the box using the FILL command
(mypost6):
%! PostScript file
/m {moveto} def
/d {lineto} def
0.072 0.072 scale

% Coords are 1/1000 th inch

newpath
4000 4000 m
4000 5000 d
5000 5000 d
5000 4000 d
closepath
0.5 setgray
fill
showpage
Before doing the fill, we set the gray level for printing at 0.5. The gray levels
range from zero (black) to one (white).

9.1. PSPLOT FORTRAN SUBROUTINES

279

Of course we can also print text of various sizes and types. Here is an example
(mypost7):
%! PostScript file
/m {moveto} def
/d {lineto} def
0.072 0.072 scale

% Coords are 1/1000 th inch

/Times-Roman findfont
300 scalefont
setfont
2000 3000 moveto
(Example text) show
showpage
Here we set the type of font with /Times-Roman findfont and then set the
font size with:
300 scalefont
This sets the font height to 0.3 inches. The scaled font must then be set to
the current font with the SETFONT command. We then move to the point (2000
3000). Finally, we print the string Example text at the current location using the
SHOW command:
(Example text) show
We must enclose the text in parentheses to denote it as a string. There are lots
of different choices for the fonts. The main choices for science graphics are Helvetica
and Times and their bold and italic variations.
Well, I could go on with these examples but it would be better to just consult a
Postscript book to learn more about all the things one can do. Two good ones are:
Postscript Language, Tutorial and Cookbook and Postscript, Language Reference
Manual, both by Adobe Systems and printed by Addison-Wesley.

9.1

PSPLOT Fortran subroutines

It is useful to be able to generate Postscript files directly from within a Fortran program. Given knowledge of the Postscript language, one could write the appropriate
commands to a file. For convenience, I have written a set of Fortran subroutines for
making Postscript files that handle a lot of the bookkeeping details and allow the
user to concentrate on the graphics. These routines are contained in:

280

CHAPTER 9. POSTSCRIPT PLOTTING


/users/shearer/PROG/PLOT/psplot.f

To call these routines from a F90 program, they should be linked with:
/users/shearer/PROG/PLOT/psplotlib.a
Documentation for these routines is contained in:
/users/shearer/PROG/PLOT/psplot.man
Here is an example program that uses some of these routines:
program testpsplot
implicit none
call PSFILE(mypost)
call PSWIND(1.5, 7.5, 3.0, 7.0, 0., 10., 0., 20.)
call PSAXES(1., 2., 0.05, i3, 5., 10., 0.05, i3)
call PSLAX(X label, 0.5, Y label, 0.5)
call PSMOVE(2., 2.)
call PSDRAW(5., 8.)
call PSEND
end program testpsplot
The resulting plot is shown in Figure 9.1.
Let us consider the commands one at a time:
call PSFILE(mypost)
This opens the Postscript file with name mypost (assigning it to Fortran unit
number 17; units 17 and 18 should not be used in any program that uses the psplot
routines). We could of course give the file any name we want at this stage. The
argument could also be a string variable, allowing the user to input the name.
call PSWIND(1.5, 7.5, 3.0, 7.0, 0., 10., 0., 20.)
This defines a user coordinate system that will be used for subsequent psplot commands. The first four numbers (x1,x2,y1,y2) define the location of a coordinate
box on the page in inches. In this case, (1.5, 3.0) and (7.5, 7.0) define the (x,y)
coordinates of the lower left and upper right corner of the this rectangle in inches.

9.1. PSPLOT FORTRAN SUBROUTINES

281

Figure 9.1: The result of the testpsplot program.


The next four numbers (x3,x4,y3,y4) define the corresponding values at these points
for the user scale. Thus in this case, the lower left point in user coordinates is (0, 0)
while the upper right point is (10,20). All other points will be linearly interpolated
(or extrapolated) using these values. Often we will want to plot the frame defined
by this imaginary box (as in the PSAXES command below) but this is not required
to define the coordinate transformation.
call PSAXES(1., 2., 0.05, i3, 5., 10., 0.05, i3)
This draws a frame with tics and numbered tic labels as specified. The position
of the frame is assumed to be defined by the corners set in PSWIND. Here is the
documentation for the PSAXES command:
PSAXES(xtic,xlab,xticl,xfrmt,ytic,ylab,yticl,yfrmt) makes a frame
with tics and tic labels as specified. The position of the
frame is assumed to be defined by the corners set in WINDOW.
xtic = small tic increment
ylab = numbered tic increment (length is 2*nxtic)
xticl = small tic length (inches)
xfrmt = format for x-axis numbers (see PSNUMB for format details)
etc. for y
Note: Dont make tic increments negative even for reversed

282

CHAPTER 9. POSTSCRIPT PLOTTING


scale plots. PSAXES should figure it out correctly. If you
dont want tic labels or a particular axis, set xfrmt or yfrmt
to a blank character, i.e. to .

Next we draw axes labels with the PSLAX command:


call PSLAX(X label, 0.5, Y label, 0.5)
The labels are offset by 0.5 inches from the frame. Next we draw a line between
user-coordinate points at (2,2) and (5,8):
call PSMOVE(2., 2.)
call PSDRAW(5., 8.)
Finally, we must close the plot file:
call PSEND
If you leave out this step, the program will not put a showpage command at the
end of the file and your Postscript plot will not work!
WARNING: Most of the numerical inputs to the PSPLOT routines must be
real numbers! Do not use 2 instead of 2. because the integer and real binary
representations of 2 are different (the subroutines have no way of knowing that you
used 2 rather than 2.) Note, however, that you can use i3 for the format for the
axes numbers if you know that there wont be decimal places in the numbers.
Here is a more complicated example that demonstrates many of the PSPLOT
subroutines. See the psplot.doc file for details.
program testpsplot2
implicit none
real :: x, y
integer :: i, icol
character (len=10) :: text
call PSFILE(mypost)
call PSWIND(1.5, 7.5, 3.0, 7.0, 0., 10., 0., 20.)
call
call
call
call

PSAXES(1., 2., 0.05, i3, 5., 10., 0.05, i3)


PSLAX(X label, 0.3, Y label, 0.4)
PSTIT(Example title,0.2)
PSNOTE(note that you might want to put down here)

call PSFRAM(1., 1.5, 2., 3.)


call PSBOX(1., 1.5, 4., 5., 0.5)

9.1. PSPLOT FORTRAN SUBROUTINES


call PSCBOX(1., 1.5, 6., 7., 0., 1., 1.)
call PSGRAY(0.)
call PSMOVE(0.5, 9.)
call PSDRAW(1., 10.)
call PSDRAW(1., 15.)
call PSTIC(0.1, 0., 1)
x = 3.1415926
call PSNUMB(x,f6.3)
do icol=1,15
x=2.5
y=float(icol)+2.
call PSMOVE(x,y)
call PSCOL(icol)
call PSSYMB(-3,0.15)
call PSMOVE(x,y)
call PSTIC(0.2, 0., 0)
call PSNUMB(float(icol),i3)
enddo
call PSGRAY(0.)
do i=1,9
call PSLORG(i)
x=8.5
y=float(i)*2
call PSMOVE(x, y)
call PSSYMB(1, 0.1)
write (text,(a4,i1)) lorg, i
call PSLAB(text)
enddo
call
call
call
call
call
call

PSMOVE(5., 2.)
PSSYMB(1, 0.1)
PSANG(45.)
PSCOL(2)
PSLORG(1)
PSLAB(Red text)

call
call
call
call
call

PSMOVE(5., 7.)
PSSYMB(2, 0.05)
PSANG(90.)
PSCOL(3)
PSLAB(blue text)

call
call
call
call
call
call

PSMOVE(5., 12.)
PSFONT(hb14)
PSSYMB(3, 0.05)
PSANG(0.)
PSCOL(1)
PSLAB(bold text)

283

284

CHAPTER 9. POSTSCRIPT PLOTTING


call
call
call
call
call
call

PSMOVE(5., 15.)
PSFONT(h8)
PSSYMB(3, 0.05)
PSANG(0.)
PSCOL(1)
PSLAB(small font)

call PSEND
end program testpsplot2

Figure 9.2: The result of the testpsplot2 program.


and the resulting plot is shown in Figure 9.2.
The PSIMAG command allows the display of a bitmap in either black and white
or color. This can be rather tricky to use but is powerful once you understand how
it works. Here is an example program:

9.1. PSPLOT FORTRAN SUBROUTINES


program testpsimag
implicit none
integer :: i, j
integer*2, dimension(50, 50) :: ibuf
integer*2, dimension(3, 50, 50) :: ibuf3
integer :: red, green, blue
real :: gray
call PSFILE(mypost)
! First do black and white image
call PSWIND(1.5, 7.5, 6.0, 9.0, 0., 50., 0., 50.)
do i = 1, 50
do j = 1, 50
gray = sqrt(real(i*j))/50.
ibuf(i,j) = nint(gray*255.)
!must be 0 to 255
enddo
enddo
call PSIMAG(ibuf, 50, 50, 0)
call PSAXES(5., 10., 0.05, i3, 5., 10., 0.05, i3)
call PSLAX(X label, 0.25, Y label, 0.4)
call PSTIT(Square root of x*y ,0.15)
! Then do color image
call PSWIND(1.5, 7.5, 2., 5., 0., 1., 0., 1.)
do i = 1, 50
do j = 1, 50
red = nint(real(i)*(255./50.))
green = nint(real(j)*(255./50.))
blue = 0
ibuf3(1,i,j) = red
!must be 0 to 255
ibuf3(2,i,j) = green
ibuf3(3,i,j) = blue
enddo
enddo
call PSIMAG(ibuf3, 50, 50, 1)
call PSAXES(0.1, 0.2, 0.05, f4.1, 0.1, 0.2, 0.05, f4.1)
call PSLAX(Postscript Red, 0.3, Postscript Green, 0.45)
call PSTIT(RGB colors, Green = 0, 0.15)
call PSEND
end program testpsimag
and the resulting plot is shown in Figure 9.3.

285

286

CHAPTER 9. POSTSCRIPT PLOTTING

Figure 9.3: The result of the testpsimag program.

You might also like