You are on page 1of 206

Performing Science

with
Open Source Software
___
Utilizing Python(x,y)

Hans Koch
Performing Science with Open Source Software

©2016 my.py
my.py@gmx.de
All rights reserved. No part of this book may be reproduced, in any form or by any
means, without the expressly written permission of the publisher. All product names
or icons mentioned herein are the trademarks of their respective owners.
ISBN-13: 978-1532799921
Performing Science with Open Source Software

Contents

1
Performing Science with Open Source Software

Welcome to Python in Science!


A few short stories will introduce you to some very versatile tools, which might well
support you in daily life as an engineer or scientist. Emphasis is put on realistic
applications, not so much on language skills. Nevertheless, you will learn some
basic Python on the fly. The book intends to help you to master the first hurdles.
Once, your appetite is whetted, you'll become addicted, I'm sure!

2
Performing Science with Open Source Software

Contents

page

Chapter 1 1
To whom it may concern
python(x,y)

Chapter 2 7
Brave New World
python

Chapter 3 17
All strings in your hand!
spyder

Chapter 4 23
Paint it black!
matplotlib

Chapter 5 43
Dress to impress!
mayavi

Chapter 6 55
Bake your own .py!
my.py

Chapter 7 67
Will you still feed me?
physionet

Chapter 8 81
Like a Nobel laureate
numpy

3
Performing Science with Open Source Software

Chapter 9 95
Let it roll!
vtk

Chapter 10 107
Some physics
scipy

Chapter 11 117
Perfect fit?
itk

Chapter 12 131
Glue
itk-vtk

Chapter 13 143
Back to school
sympy

Chapter 14 157
Optimistic optimization
cvxopt

Chapter 15 167
Are you certain?
uncertainties

Chapter 16 177
Your own statistics
statsmodels

Chapter 17 187
Publish or perish!
reportlab

Chapter 18 199
Tools
Table of tools

4
Performing Science with Open Source Software

To whom it may
concern

python(x,y)

1
Performing Science with Open Source Software Chapter 1 To whom it may concern

Chapter 1
To whom it may concern
1.1 Who are you?
A student? Fine! A scientist? Welcome! An engineer? All right! A start-up
entrepreneur? Even better! A beginner in programming? Well ... may be. Some basic
knowledge in programming would be useful.
All of you could benefit from this book. It provides a guided tour through a row of
useful applications of open source software tools. It introduces a selection of
attractive examples with the aim to become immediately productive. Just to water
your mouth...

1.2 Who am I?
I am a senior scientist, no computer freak. I use software not as l'art pour l'art, but as
means to get results and to understand things better. Modelling, data analysis,
visualization and documentation require skills which everyone in science has to
refine in his or her scientific career. I would like to make you aware of some
toolboxes that you may not want to miss for the years to come.
This book evolved from the scripts of a lecture I presented at Technical University
Berlin in the physics department. Thus the audience consisted of students majoring
in physics. However, students from other disciplines, e.g. engineering, could have
participated beneficially. I myself am heading the division of 'Medical Physics and
Metrological IT' at Physikalisch- Technische Bundesanstalt (PTB), the German
National Metrology Institute. Therefore, many examples and applications in this
book emerged from this context.
Finally: I am not an expert in Python programming. Only recently, I learned this
language to a basic degree. Those who are real Pythonists will immediately detect
this and would have numerous ideas how to improve the code for sure.

1.3 Then, why Python?


Actually some of the tools which are presented in this book have originally been
written in alternative languages like C++, TCL, UNIX, Fortran, etc. and have been
wrapped to or duplicated to Python. I will explain, how to deal with this.
There is no particular preference to this language. The main reason is Python(x,y),
this wonderful basket of Open Source Software toolboxes, where Python is the
bracket, which keeps everything together.

2
Performing Science with Open Source Software Chapter 1 To whom it may concern

The basics of Python are relatively quick and easy to learn. The code can be kept
simple but may be adapted to highly complex problems as well. A very powerful
language indeed!

1.4 "Should I learn Python before I start with this


book?"
You could. However, I would propose to follow the style of "learning by doing". I
recommend to utilize Spyder for programming, the very convenient Integrated
Development Environment (IDE) especially suited for Python, and which will be
introduced in chapter 3.

1.5 Why Python(x,y)?


Well, Python(x,y) is a comprehensive collection of open source program toolboxes
satisfying nearly every need you might have in your scientific career, except in very
special cases where you might need very special software. Most of the toolboxes you
could also download and set up individually. But that would be a very tedious
enterprise. I myself failed poorly to set up e.g. ITK just by following the advice from
ITK's homepage. Too complicated for a non-freak. With Python(x,y) the whole
bouquet of programs are downloaded and installed at once and without pain! In
addition a sack full of relevant documentation is uploaded together with this parcel.
This alone makes it a pleasure to work with Python(x,y)!

1.6 Why Open Source Software?


First obvious reason: it is free of charge! Good for students and young entrepreneurs.
Often the licensing policies are very attractive. Good for start-ups. Nevertheless, you
have to check individually.
But the main advantage: some open source tools are clearly competitive in quality
compared to its commercial counterparts. LINUX and Wikipedia demonstrated quite
convincingly that a transparent grassroots' activity that receives control and feedback
from the public is able to create a superior product. Well, admittedly, not all open
source products are worthwhile a try. Nevertheless, the Darwinist evolution brings
up some real winners.

1.7 Where to find what?


That's the biggest problem. You can easily get lost in the seemingly limitless
software market. How to find the goodies? ...and if you found some: where are a
good documentation, helpful examples, explanations of error messages, etc.?

3
Performing Science with Open Source Software Chapter 1 To whom it may concern

Of course modern search engines could mean a support in many cases. But also with
them: blind alleys are numerous.
Another problem: ever changing versions, updates, variants, ...
All this founded the motivation for this book. Besides introducing you to a selection
of some goodies that may be immediately useful for your daily work, it is intended
to show some strategies, how to survive in this jungle. Because the speed of change
in this field is so dramatic, I have tried to explain things in a more general manner
and not so much step by step, since what may be true today will be outdated a few
days later. Just see on the homepage of Python(x,y) how often they changed and
updated the versions!
The main objective of this book is to whet your appetite and to propose how you
could help yourself when diving into this fantastic universe of Open Source
Software.

1.8 Load down!


As I mentioned before: things change frequently. The sites and methods to download
the Python(x,y) parcel have changed in the past as well. Try
https://python-xy.github.io/
or search for the homepage of Python(x,y) and follow the instructions for download.
Watch out during installation! If the following window appears, then choose either
the "Full" installation or the "Custom" one. In the latter case, you may miss some
chapters later components, which you need, e.g. "ETS","ITK", or others.

Alternatively, you may click the square with the "+"-sign beside "Python" and
check, which components you want to load.

4
Performing Science with Open Source Software Chapter 1 To whom it may concern

Any time later you may add by this way needed components.
After a successful installation the following icon (or an updated version of it) should
appear on your desktop:

Double-click and continue with chapter 2!

1.9 Alternative: Anaconda


If Python(x,y) should disappear, which I do not hope, then Anaconda
https://www.continuum.io/why-anaconda
would be an alternative. Not a bad one! Some favor it, others not. It is a question of
preferences. I started with Python(x,y) and I stick to it.

5
Performing Science with Open Source Software Chapter 1 To whom it may concern

Notes:

6
Performing Science with Open Source Software

Brave New World

python

7
Performing Science with Open Source Software Chapter 2 Brave New World

Chapter 2
Brave New World
Of course this phrase is far too euphemistic. More down-to-earth: we are talking
here about the entrance to the cosmos (still not really down-to-earth) of Python(x,y).
According to circumstances, it can be the entrance to fun, frustration, hard work,
welcoming help, you name it. However, you may nearly always transverse through
this gateway.

2.1 The launcher

Welcome! Here is where you start. Looks simple, doesn't it? Your Python(x,y) home
might differ in detail, if you have a more recent version of it.
Of course you may start playing around with the few buttons, pull down selections
and labels. Who is really reading user guides from start to end? However, after a
while of 'try and error' you might want a hint where to start. Well, I would suggest
two possibilities: Route A or Route B. If you cannot wait to start with a real
application, jump to the end of this chapter, to Route B, otherwise just continue.

8
Performing Science with Open Source Software Chapter 2 Brave New World

2.2 Route A: for Python beginners


If you are a newcomer to Python then it is worthwhile to learn at least some basics
with a Python tutorial. You will find several introductions when selecting the
"Documentation" label, and under "General Documentation:" the "Documentation
Folder":

In that documentation folder you should open the Python folder. Here quite a few
introductions to Python and related topics demanding different levels of skill,
qualification, and previous programming knowledge may be found. If these do not
come up to your expectations, you may find other tutorials in the Web. Look e.g. to
the first chapter of David M. Beazley's "Python Essential Reference". Another
choice is the tutorial on the official Python web site.
Just reading a tutorial may not be helpful alone. Learning by doing and testing your
own variants of what is presented will promote your skills more effectively. Then
you should open an interpreter.
If you want to have the same look and feel like in the official Python tutorial you
might use the official Python interpreter. On the Python(x,y) launcher select the
"Shortcuts" label and under 'Interactive consoles:' pull down and select 'Python', and
finally push the button beside:

...and it will appear - after a while - for your service:

9
Performing Science with Open Source Software Chapter 2 Brave New World

Another fine test bed for your first steps could be IPython(Qt):

Now, after a while, the interactive console should appear and you may enter your
first line of Python code behind In [1]: as shown below:

10
Performing Science with Open Source Software Chapter 2 Brave New World

If you hit the enter key... Eureka! Your first homemade Python output! Continue
with the following inputs. You will recognize:
• each input line is followed by some kind of output,
• each input line is numbered,
• you may use the "up"- and "down"-arrow keys to climb up and down the older
input lines,
• then, if you want to modify an old input (e.g. because you are lazy to type), just
climb to that line, modify the text and push the Enter key: a new input line has
been generated. In the example shown below, I climbed back to input
In [1]: and replaced print by a = using the leftward arrow key, and
then obtained input In [2]:.

2.3 If you can't wait: Python's basic Basics


In case you are eager to obtain your first useful and relevant result without wasting
time to plough through the tutorials, then just follow me to learn the basic basics on
the fly. You may complete your skills later.
As a start let us inspect the main object types, operators, functions, and keywords.
With the your first input you learned already your first keyword, i.e. print and
your first object type: a string object. A string variable contains text, text that may
contain numbers and symbols as well. The difference between real numbers and
numbers of the string type you will realize a few lines later.
It is very obvious, that the keyword print does what it says: it prints the object
which follows this keyword.

2.4 Strings
The second input gives the string object a name, or more precisely said: "a" is
assigned to the string object "Hello World!".
Your interpreter recognizes a string type variable by the quotes, either
"...text..." or '...text...' or """ ...text over several
lines and/or containing quotes or other symbols ... """ .
If you now enter in In [3]: just the variable a itself, then the output tells you
what a stands for.
A quite useful function is type(), if you want to find out what type of object you
are dealing with. In this case, it is a string.
In [1]: print "Hello World!"
Hello World!

In [2]: a = "Hello World!"

11
Performing Science with Open Source Software Chapter 2 Brave New World

In [3]: a
Out[3]: 'Hello World!'

In [4]: type(a)
Out[4]: <type 'str'>

2.5 Integer and floating point numbers


The following lines speak for themselves. Just by assignment, the object type of the
variable is fixed. This may seem to be trivial, but other programming languages
declare the variable type in a more complicated manner.
Watch the #-symbol, it leads a comment, i.e. everything following this sign until the
end of the line will be ignored by the program and serves only for clarification of the
code. Thus, when you duplicate my code on your computer, you do not need to type
the comments. If the comment is longer than a line then the line feed symbol \n is
used.
In [5]: b = 30

In [6]: b
Out[6]: 30

In [7]: type(b) # b is an integer number (what else?...)


Out[7]: <type 'int'>

In [8]: c, d = 7.2, '5'


#instead of writing c = 7.2 and in a new line d = '5'

In [9]: c
Out[9]: 7.2

In [10]: type(c) # c is a floating-point number


Out[10]: <type 'float'>

In [11]: b = c

In [12]: b
# b stands not anymore for the original integer number 30 \n
but changes its type and content

Out[12]: 7.2

In [13]: b = int(b)
# function 'int()' converts b from float type to integer type
In [14]: b
Out[14]: 7
# but of course the digits after the decimal point got lost

In [15]: b = float(7) # conversion to float type

In [16]: b
Out[16]: 7.0

12
Performing Science with Open Source Software Chapter 2 Brave New World

In [17]: type(d) # see: In[8]!


Out[17]: <type 'str'>

In [18]: d = int(d)
# a string type can be converted to an integer as well

In [19]: d
Out[19]: 5

2.6 Simple Operations


In [20]: print d + 6 # summation
Out[20]: 11

In [21]: print -d - 6 # subtraction with a negative d


Out[21]: -11

In [22]: d = d + b
# watch: d has been an integer, but when an operation \n
with a floating point variable occurs, then the result \n
is a floating point variable as well

In [23]: d
Out[23]: 12.0

In [24]: print 3 + 4, '3' + '4'


# the '+'-operator acts differently for numbers and strings!
Out[24]: 7 34

In [25]: g = str(int(d * 2))


# d multiplied by 2, then altered to an integer, \n
and finally to a string

In [26]: g
Out[26]: '24'

In [27]: print g * 3
Out[27]: '242424'
# the '*'-operator acts differently for strings!

In [28]: h = int(d)/2 # a division

In [29]: h
Out[29]: 6

In [30]: print h**2 # h to the power of 2


Out[30]: 36

In [31]: d + g
------------------------------------------------------------
Traceback (most recent call last):
File "<ipython console>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'float' and 'str'

Ups! We produced our first error. It is of course not possible to add a 'float'- with a
'string'-variable! The error message says just this.

13
Performing Science with Open Source Software Chapter 2 Brave New World

2.7 Sequences
Sequences are collections of items. Here I introduce three different types:
• Actually 'str' is one type of a sequence and -as we have already learned- is
identified by quotes.
• Tuples are characterized by a collection of items which are separated by
commas. The different items may be of different types. Even tuples can be items
of a tuples. Beginning and end of a tuple may be indicated by round brackets.
• Lists are like tuples with an important difference: they are mutable. I.e. their
items can be changed by various operations while tuples can be modified only
in a very limited manner. Lists are identified by square brackets.
Please note: the sequences mentioned above are not 'arrays'! With the latter, we will
work in a later chapter dealing with "NumPy".
For a change, let us start a new IPython-session, i.e. starting with input In [1]:.
In [1]: a = "This is a string"
In [2]: b = 8, 9.2, '70 days' # This is a 'tuple'

In [3]: b
Out[3]: (8, 9.2, '70 days')

In [4]: c = [a, 3, 6.4, b]


# This is a 'list' containing among other items a 'tuple'!

In [5]: c
Out[5]: ['This is a string', 3, 6.4, (8, 9.2, '70 days')]

Now, what is common to all sequences is the indexing, i.e. the numbering of the
items. The counting starts with index '0', thus the third item has the index '2'. If you
want to know e.g. the value of the first item of a sequence, proceed as follows:
In [6]: print a[0], b[0], c[0]
Out[6]: T 8 'This is a string'

If you want to know the second and third item together you ask for an index range
from index i to index j: [i:j]. This procedure is called "slicing".
In [7]: print a[1:3], b[1:3], c[1:3]
Out[7]: hi (9.2, '70 days') [3, 6.4]

Note that j is equal to the index of the last item plus 1! Now try to replace the third
item in a 'list' and in a 'tuple'.
In [8]: c[2] = 'replaced'
In [9]: c
Out[9]: ['This is a string', 3, 'replaced', (8, 9.2, '70 days')]
In [10]: b[2] = 'replaced'
------------------------------------------------------------
Traceback (most recent call last):
File "<ipython console>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment

14
Performing Science with Open Source Software Chapter 2 Brave New World

See the difference? A list is mutable, a tuple cannot be changed in this way. Guess
what is c[3][2]?

2.8 Do not get surprised!


Sometimes the program does what you do not expect. E.g., be aware of the
following:
In [11]: d = c
In [12]: d[3] = 'change'
In [13]: c
Out[13]: ['This is a string', 3, 'replaced', 'change']

Not only d[3] is replaced but c[3] as well! If you want to avoid this make a
copy of c, then a change in d does not affect c.
In [14]: d = copy(c)
In [15]: d[3] = 'again'
In [16]: c
Out[16]: ['This is a string', 3, 'replaced', 'change']
In [17]: d
Out[17]:
array(['This is a string', '3', 'replaced', 'again'],
dtype='|S16')

However d is of a different type! Strange, isn't it? More to this phenomenon later.

2.9 Route B: for Python(x,y) starters


Click the Spyder symbol and continue with chapter 3!

15
Performing Science with Open Source Software Chapter 2 Brave New World

2.10 Cheat sheets


When in school, cheat sheets helped, so they do in programming. The Web is full of
them, e.g.:
"http://coffeeghost.net/pybat/python_cheat sheet.png".
Hopefully, it is still available, but there are many others.

16
Performing Science with Open Source Software

All strings in your


hand!

spyder

17
Performing Science with Open Source Software Chapter 3 All strings in your hand!

Chapter 3
All strings in your hand!
3.1 Your IDE
If you want to work properly you need a proper workplace, an IDE. IDE means
Integrated Development Environment. Well, there are different choices. Eclipse is a
possibility. This was the IDE I used in the first versions of Python(x,y). However,
since Spyder was provided, I switched to it and stayed. My proposal: choose it as
your environment as well.

3.2 Spyder
If you hit the -button on the Python(x,y) gateway, ... after a while ... Spyder will
show up. Welcome!

... and a while later, the IDE emerges.

18
Performing Science with Open Source Software Chapter 3 All strings in your hand!

Probably it will look different than the figure shown above, because this workplace
may be modified in numerous ways. One can tailor it to one's individual preferences.
But before you try alter its design by try and error or by reading carefully the help
instructions pulling down the 'Help' menu of the Spyder menu bar, you may follow
my instructions first. It will be advantageous, if we start with the same, or nearly
similar design. Later, when you are more experienced, you will change the design
anyhow, I guess.
In the screenshot above the Python interpreter is visible in the console pane. You
may instead activate the IPython interpreter, if you prefer it (see figure below), or
vice versa.

Next, look for the menu bar (the uppermost bar), pull down the 'View'-menu and
choose "Panes" and "Toolbars" and activate those items with a tick as have been
done in the following screenshots. With that, we should be at equal terms. If not, due
to a new spyder version or other reasons, the layout should look similar at least.
This is a minimalistic choice in order not to get confused. Something we need at the
beginning. Later you will experience many other useful tools that we did not activate
at the moment.

19
Performing Science with Open Source Software Chapter 3 All strings in your hand!

After this customization, you reign over three application windows:


• An ''Editor'' on the left. We come to it immediately.
• On upper right a file explorer showing your file system. You will be familiar
with it, as it is similar to what you know from other Windows applications.
• On lower right a so-called 'console' containing at the moment -if you have
chosen it- something you will already know from chapter 2: the 'IPython'-
Interpreter.
Just for curiosity, type the following input lines into this interpreter:
In [1]: a = ()

In [2]: type(a)
Out[2]: <type 'tuple'>

You have assigned a with an empty tuple (). We come back to this later. For now,
we move to our main workhorse: the 'Editor'.

20
Performing Science with Open Source Software Chapter 3 All strings in your hand!

3.3 The 'Editor'


• Create a folder e.g. under 'C:' and name it e.g. 'py_files'.
• Open in the usual way ('File' pull-down menu) a 'New file' or by clicking this
symbol: . This file will have the intermediate name 'untitled1.py'. All
Python files have the ending '.py'. You should rename this file, e.g. to 'test_0.py'
by 'Save as' under the 'File' menu and save it in the newly created sub-folder
named 'py_files'.
• Below the first few automatically generated lines (which you might even
eliminate) please enter:
a = ()
print type(a)

• ... and hit the -button (situated on the tool bar).


Now respective changes will appear in the 'File explorer' panel and as well in the
'Console' panel. In addition, in the object inspector panel the documentation of the
object 'type' automatically showed up, while you typed it in the editor!

The 'File explorer' presents an update of your file system because now your new
folder 'py_files' is included, containing the file 'test_0.py'. Either you can see it
already or you have to browse to the right location in the file system displayed.

21
Performing Science with Open Source Software Chapter 3 All strings in your hand!

The 'Console' window now displays the output of the program 'test_0.py', i.e. it
informs that object a is of the type 'tuple'. A result which required a number of
actions with the 'Editor' in contrast to the immediate output we obtained with the
'Interpreter'. So, what are the advantages of an 'Editor'? Let us look for the
differences.

3.4 Editor vs. Interpreter


Already the second input lines we typed in both modes show subtle differences: in
'IPython', we just had to write type(a) in order to obtain an output, while in the
'Editor' only print type(a) would generate an output. Try to run the
mini-program just without the print-keyword, nothing will happen.
A program written in the 'Editor', and then compiled and run, will only then produce
an output, if it is 'told' to do so by respective keywords or functions. With the
'Interpreter', you obtain an information, if in the latest input line an object (and not
an assignment) is written. This information could be the value(s) of the variable, the
type of the object, an error message, etc.
However, the big difference between 'Editor' and 'Interpreter' becomes obvious,
when you exit spyder and start it again: the contents of the 'Interpreter' are gone, and
you have to start from scratch. Not so comfortable for multiline codes of
considerable size! In the 'Editor', the whole script is still present after restarting
spyder. Even if one cancels that file from the 'Editor', it is very convenient to retrieve
it again: if you double-click the respective file in the 'File explorer', it will appear
immediately again in the 'Editor'. Another advantage of the 'Editor': it is much more
comfortable to -as the name says- edit the script, like 'cut and paste', 'search and
replace', 'comment\un-comment', debugging, auto-completion, etc..
Thus, generally speaking, the 'Interpreter' is best suited for "quick and dirty"-jobs, if
you want to make a jiffy calculation or test a few lines of code on the fly. For more
sophisticated programming, something you want to keep and/or modify frequently,
the 'Editor' will be your workbench.

3.5 Do it yourself
Take your first steps with the Spyder tutorial and try the various menus, toolbars and
help functions. If you do not yet understand some terms and explanations -- don't
worry. Comes time, comes comprehension...

22
Performing Science with Open Source Software

Paint it black!

matplotlib

23
Performing Science with Open Source Software Chapter 4 Paint it black!

Chapter 4
Paint it black!
4.1 Aim: a publishable figure
Reputable scientific journals request that figures follow certain style rules, which
differ from journal to journal. Papers will not be accepted, if e.g. line widths, letter
sizes, figure captions, etc. are not according to those rules outlined in the author's
instructions. Note e.g. the instructions of the American Physical Society (APS):
https://authors.aps.org/STYLE/ms.html#figures. A quotation from the latter: "1. Line
drawings (original drawing or photograph of drawing). This kind of figure is
considered the most desirable. Use black ink on a white background." Hence our
chapter title: Paint it black!
Another citation which shows the bad experience APS has with submitted figures
generated by computer programs: "4. Machine-generated (computer output or
material reproduced directly from automatic plotters). Figures of this type are
sometimes not acceptable because of unsuitable lettering size, lettering quality, or
curves that break up when the figure is reduced. Figures must have clear background
and unbroken lines with as much black and white contrast as possible."

4.2 Jump into cold water ... and ask for help!
A very convenient way to proceed is to cannibalize already existing codes and adopt
them for one's own purposes:

• Start 'spyder' and make sure to use the Python Console.

• Open a new file in the 'Editor'.

24
Performing Science with Open Source Software Chapter 4 Paint it black!

• Open the homepage of 'matplotlib' via the Python(x,y) launcher (c.f. left) or
utilize directly Spyder's help (c.f. right):

• Look for and open the 'Gallery' (menu bar) and select among the many
examples the following figure (check the pylab examples and look for 'log_test')
and double-click it:

Now the following page should appear:

25
Performing Science with Open Source Software Chapter 4 Paint it black!

We see the figure and below the code, which generates it. Below the headline you
have the choice of accessing the source code, a png- , a high resolution png-, and a
pdf-file of the figure.

• Now, let us 'cut' the source code here and 'paste' it to the 'Editor' in 'spyder',
'save' it as your own file (e.g. as 'my_first_fig.py') and 'run' it. "Your" figure
should appear now somewhere. Or... maybe it is hidden behind the spyder
window.

26
Performing Science with Open Source Software Chapter 4 Paint it black!

I guess, you will be disappointed by the result, if you are used to other plotting
programs. It took a long while, till the Figure-window popped-up (If you don't see it:
it may hide behind the Spyder window). This application window does not allow too
many manipulations, and those that can be done, cannot be performed in a very
controlled and reproducible manner. Try to play around and you will recognize it.
There is a description for this Figure window and the interactive navigation in
chapter four of the user's guide for matplotlib, which is available in the pull-down
help menu (under 'Installed Python modules'). If you pan and scale and alter the
aspect ratio, and then save the result, then the saved figure turns out to be
non-optimal.
Please keep calm. Things will improve. For now, we should focus on analyzing the
source code. It is quite simple and almost self-explaining:
1 #!/usr/bin/env python
2 from pylab import *
3
4 dt = 0.01
5 t = arange(dt, 20.0, dt)
6
7 semilogx(t, exp(-t/5.0))
8 grid(True)
9
10 show()

27
Performing Science with Open Source Software Chapter 4 Paint it black!

The first line starts with the '#'-sign. Remember, this identifies a comment line, i.e.
something the compiler more or less ignores. But comment lines are quite useful for
us users. Besides providing us with information or improving the documentation, it
is a helpful tool to analyze a program. If your cursor is placed in a certain line, or if
several lines are selected, then one can 'comment' or 'un-comment' such lines with
the -button (to be found in the spyder tool bar).
Try to 'comment' and 'un-comment' the function grid(True) in line 8. If you run
the program with the line commented, then the figure is not gridded. Surprising, isn't
it? The same effect we would obtain if the 'un-commented' grid(True)is
followed by grid(False).
But let us start from the beginning with line 2. 'pylab' is a matplotlib-based Python
environment and what we do with this line is to import all ('*' is the wildcard)
modules of 'pylab' from 'pylab'. However, this is not the recommended way to
import modules (see the next sub-chapter). Not without reason there is a warning
sign adjacent to this line of code .
The pylab mode provides all of the 'matplotlib' plotting functions, as well as
non-plotting functions from 'numpy' and 'matplotlib.mlab'. NumPy is another
module, which we will introduce in more depth in one of the next chapters.
arange() in line 5 is one of numpy's functions. Let us find out, what it does, by
inquiring for help. There are several possibilities to obtain help:

• Quick and quite informative: activate the 'Object inspector' (check the 'View'
menu in the Spyder menu bar). Before going on, check the "Preferences" for the
"Object inspector" (Spyder's "Tools" menu!) and tick the boxes for "Editor",
and the two consoles.
• If you have done so, then it happens - while you are typing a name of a function
even before you finish the name - that a small help-window pops up in the
editor as a line continuation help, providing you with the available extensions.
So you may select ' arange' here and when you continue with the left bracket
'(', then an explanation about the function appears in the 'Object inspector',
leaving you more time to scratch your head.
• For lonesome hours: study the numpy documentation ('Help'-menu in the
Spyder menu bar).
• Or just add to your script the line: print t and guess from the output, what
happened.
Obviously line 7 in the original code contains the function to be plotted. But not only
that. It also cares for the semi logarithmic diagram design.
Finally, show() is the function, which triggers the 'Figure' panel with the plot.
Actually, in recent versions one does not need this function anymore, the 'Figure'
panel will show up anyhow.

28
Performing Science with Open Source Software Chapter 4 Paint it black!

I propose, you should not save the figure via the 'Figure' panel but by adding the
following line:
savefig('C:/py_files/my_first_fig.eps', dpi=600)

to your script and run the program again. This image file in 'encapsulated postscript
(EPS)'-format has a resolution of 600 dpi. If you would have chosen .png or
.jpg instead of .eps you would have created figures of the other formats. You
may import these files to your text- or presentation-scripts. See below, how the
saved figure looks:

Note:
Each time you re-run the program a new Figure panel will be created. They have to
be cancelled manually, otherwise they pile up. You may do this cancellation as well,
if you push the warning icon on the console panel and then the run triangle at the
same position, so that the warning icon appears again.
So far, a short lesson on how to obtain a plot. Of course, this is not yet the style
required by serious scientific journals. See next, how we have to modify.

29
Performing Science with Open Source Software Chapter 4 Paint it black!

4.3 Getting confused


When you try to work through some of the other examples of matplotlib's gallery,
you will observe, that various code styles exist. It seems, as if everyone has his/her
individual programming philosophy, and matplotlib supports all these styles. But if
you try your own first steps, you will soon experience, that there must be some rules
one has to obey, otherwise one pitfall is followed by the next.
Three approaches are commonly used and will be introduced here:
Let us modify the code we used in the last chapter by entering our own function and
plot it in the usual design with linear x- and y-axis:
1 from pylab import *
2
3 t = arange(0.0, 1.001, 0.001)
4 y = cos(16.0 * pi * t) * exp(-t / 0.4)
5
6 plot(t, y)

The code is very similar to the well-established MATLAB/Mathematica-style. It


appeals by its clear appearance and reduced typing needs, and thus, is well suited for
the use in interactive interpreters like IPython.
A different style, which should be preferred in program scripts, looks as follows:
1 import matplotlib.pyplot as plt
2 import numpy as np
3
4 t = np.arange(0.0, 1.001, 0.001)
5 y = np.cos(16.0 * np.pi * t) * np.exp(-t / 0.4)
6
7 plt.plot(t, y)

This style has the advantage of better transparency, while in MATLAB-style coding
many things happen in the background, and the origin of the functions and modules
stems from a big toolkit pool. In the above coding the aliases for numpy, i.e. np and
for matplotlib.pyplot, i.e. plt in front of each function or module indicate the
respective dependence and thus reduce ambiguities.
Now a third variant with even more typing:
1 import matplotlib.pyplot as plt
2 import numpy as np
3
4 t = np.arange(0.0, 1.001, 0.001)
5 y = np.cos(16.0 * np.pi * t) * np.exp(-t / 0.4)
6
7 fig = plt.figure()
8 ax = fig.add_subplot(111)
9 ax.plot(t, y)

30
Performing Science with Open Source Software Chapter 4 Paint it black!

This code is more object-oriented. Advantages of this approach will become obvious
only later. Now it looks as if things are made uselessly complicated. Just let me refer
to a comment of matplotlib's user guide: "So, why do all the extra typing required as
one moves away from the pure MATLAB-style? For very simple things like this
example, the only advantage is educational: the wordier styles are more explicit,
clearer as to where things come from and what is going on. For more complicated
applications, the explicitness and clarity become increasingly valuable, and the
richer and more complete object-oriented interface will likely make the program
easier to write and maintain."
Another cause for confusion might be the difficulty to find out, why certain things
happen, although it seems, as if they are not contained in the code. An example in
above scripts: Why is the line representing the function in the figure colored blue?
Even stranger: if we add another line, e.g. by drawing a baseline with
ax.plot([0,1],[0,0])

then this line will be green!? ...and how may we change this line style?
If we consult the homepage of matplotlib and use the 'Search' function by entering
'color' to the 'Search'-window, we will end up in a big mess: hundreds of links
without any information that could help us beginners. A little bit improved is the
situation, if we select on the Python(xy) launcher the doc menu, look for the
matplotlib user guide and search then in this pdf-file for 'color'.
A more promising way is again to consult the object inspector and enter 'plot' as the
'Object':

The first line plot(*args, **kwargs) indicates, that the 'plot' function can
be called with arguments ("*args") and in addition may be specified by so-called
keyword arguments ("**kwargs").

31
Performing Science with Open Source Software Chapter 4 Paint it black!

Arguments are the x- and y-variables and optional string variables, which steer the
formating. Several possible format strings are given further below in the plot
function description. Also the different possibilities to modify the style by keyword
arguments are introduced. Try several variants in order to get acquainted.
... and further below is the information about matplotlib colors:

Although one could change the line color to black and increase the linewidth by a
factor of three with the following minimal art typing:
ax.plot(t, y, 'k', lw=3)

I would prefer:
ax.plot(t, y, color='black', linewidth=3)

The latter serves as a more transparent, self-explaining coding style.


Coming back to the question: why were the lines blue and green, when we did not
specify the line color by 'k' or 'black'? The answer is: by importing all
matplotlib.pyplot functions with the first line of our script, we also imported a file,
which determines the default parameters of very many figure properties.
With this last code line, the last script would produce the following figure:

32
Performing Science with Open Source Software Chapter 4 Paint it black!

4.4 Getting serious


After all these preliminaries we finally should approach our goal to achieve a figure,
which is publishable in a serious scientific journal. We have to take care, that the
linewidths and letter sizes are such as they are still of high quality when the figure is
reduced to app. 8 cm text column width. The resolution should be at least 300 dpi
and the figure file should be of the postscript (.ps), encapsulated postscript (.eps)
or .png-type.
First of all let us define the figure size ourselves, overriding the (hidden) default
values. Since matplotlib is an US product, but I am from the rest of the world, were
the SI units are common place, I converted the inch-values into mm-units and have
chosen an appropriate aspect ratio. Now the figure size fits well to a letter size with
landscape orientation. This is what the scientific journals like to receive in order to
be able to produce a high quality reproduction. Be aware, that this figure will be
downscaled heavily when published.
Check the next screen shot for the respective code in the 'Editor' panel. See how the
image is saved, and after running the program, recognize that the result is listed in
the 'File explorer'. Double-click it here and the image will pop up. This reflects more
correctly the result than the matplotlib image pane, which I eliminated by:
plt.close(1)

33
Performing Science with Open Source Software Chapter 4 Paint it black!

When inspecting this script, you will recognize, that we repeated more or less the
code shown in the previous sub-chapter, i.e. the third variant coding style. The only
major difference is line 10: this code defines the figure axes frame in relative units,
i.e. the axes origins are positioned at x0 and y0, which are chosen 0.2 times the
available figure width and 0.2 times the available figure height respectively. The
width of the x-axis is 0.75 times the figure width and the height of the y-axis is 0.75
times the figure height. As we will see soon, this broad space at the left side and the
bottom of the figure is needed for the labelling of the axes.
In the following script a few more lines have been added, leading to a better looking,
but still an intermediate result.
5 t = np.arange(0.0, 1.001, 0.001)
6 y = np.cos(16.0 * np.pi * t) * np.exp(-t / 0.4)
7
8 mm = 0.1/2.54
9 fig = plt.figure(figsize=(200.0 * mm, 160.0 * mm))
10 ax = fig.add_axes([0.2, 0.2, 0.75, 0.75])
11 ax.plot(t, y, color='black', linewidth=3)
12
13 ax.tick_params('both', labelsize= 24, pad = 16, length=10,
14 width=1.5)
15 ax.set_xlabel(r'time ($\mu$s)', fontsize = 30)
16 ax.set_ylabel(r'amplitude ($\mu$V)', fontsize = 30)
17 ax.axhline(linewidth=1.5, color='black')
18
19 plt.savefig('C:/py_files/fig_1.png', dpi=300)
20 plt.close(1)

34
Performing Science with Open Source Software Chapter 4 Paint it black!

Since the ticks have been too faint and two small, I changed the tick parameters with
line 13//14. How did I find out, that this is the way to do it? Well, that was not easy.
The various matplotlib tutorials did not help.

4.5 My trick
If I want to find out, what possibilities are possible for e.g. processing the axes in my
upper script, i.e. ax, then I proceed as follows:

• Cutting those lines in the editor, which are essential with respect to ax and
pasting them into the console panel. If you are lazy, you could also paste all
lines from the beginning to line 10. In the console each line will be
automatically be pasted line by line headed by '>>>'

• Next, adding in the console pane a new line, starting with ax.. Then a window
is popping up, showing all available options for ax.. Move down with the
slider bar along the alphabetically sorted list, and you find several objects
dealing with ticks.

35
Performing Science with Open Source Software Chapter 4 Paint it black!

• In order to get quicker to those objects, you might continue like ax.ti.
Although here is already, what we were looking for, i.e. tick_params, you
may be disappointed, to find only two objects starting with 'tick...', but there are
many more. Select ax.set_ or ax.get_ and among those objects you will
find many for getting or setting 'tick'-related properties.

• Now, how to handle tick_params? If you double-click in the above shown


configuration on this object, then the line continuation will produce
ax.tick_params and now just add (. This will trigger a description of this
function in the object inspector panel:

36
Performing Science with Open Source Software Chapter 4 Paint it black!

With the search-trick above, I utilized the fact, that Python is an object-oriented
language: nearly everything in Python programs is an object, or an object of an
object, or an object of an object of an object. Objects are characterized by a
respective address in the memory, and objects have properties, parameters,
functions, methods, i.e. other objects. In our example script ax is an object, which
determines the design of the axes. To find out all the properties/parameters of ax
we could just add such line:
print plt.getp(ax)

and run the program. As an output we obtain a print-out list, of which the last few
lines are reproduced here:
window_extent = TransformedBbox(Bbox('array([[ 0.2 , 0.2 ], ...
xaxis = XAxis(125.800000,100.600000)
xaxis_transform = BlendedGenericTransform(CompositeGen...
xbound = (0.0, 1.0)
xgridlines = <a list of 6 Line2D xgridline objects>
xlabel = time (<font name=Courier>\mu</font>s)
xlim = (0.0, 1.0)
xmajorticklabels = <a list of 6 Text xticklabel objects>
xminorticklabels = <a list of 0 Text xticklabel objects>
xscale = linear
xticklabels = <a list of 6 Text xticklabel objects>
xticklines = <a list of 12 Text xtickline objects>
xticks = [ 0. 0.2 0.4 0.6 0.8 1. ]
yaxis = YAxis(125.800000,100.600000)
yaxis_transform = BlendedGenericTransform(BboxTransformTo(...
ybound = (-1.0, 1.0)
ygridlines = <a list of 5 Line2D ygridline objects>
ylabel = amplitude (<font name=Courier>\mu</font>V)
ylim = (-1.0, 1.0)
ymajorticklabels = <a list of 5 Text yticklabel objects>
yminorticklabels = <a list of 0 Text yticklabel objects>
yscale = linear
yticklabels = <a list of 5 Text yticklabel objects>
yticklines = <a list of 10 Line2D ytickline objects>
yticks = [-1. -0.5 0. 0.5 1. ]
zorder = 0

Several of the parameters you will not understand, others are self-explaining.

4.6 Continuing getting serious


Coming back to line 13 of the script. With this single line we apply the changes to
both, the x- and the y-axis, ergo the first parameter 'both'. The keyword
"labelsize" refers to the tick-labels; not to be mixed up with the axis-labels,
which we will turn to later. The size '24' in dpi is chosen by try and error. Similarly
the values for pad, length, and width were tested to give the best looking
result.

37
Performing Science with Open Source Software Chapter 4 Paint it black!

Lines 15 and 16 are more self-explaining. You might check the chapter "Working
with text" in matplotlib's user guide for text formatting rules, e.g. for Greek
characters, the 'mu'.
Finally, with line 17, I added a horizontal line at y=0 , as ax.axhline()
suggests.

4.7 Not yet perfect


Annoyingly, the last steps to perfection are the most time consuming and frustrating
ones. My experience: first I did not find a simple way to increase the linewidths of
the rectangle bounding the axes area. If you examine the last figure carefully, you
will recognize, particularly at the corners, that the ticks are thicker than the bounding
line. Neither I found an elegant way to type x- and y-axis labels with mixed fonts
(Latin and Greek) resulting in a similar character size. Note, the 'µ' is considerably
smaller than the Latin characters. This violates my aesthetics!
After a while, I discovered the solution for the first problem:
plt.setp(ax.spines.values(),linewidth=1.5).
By studying several matplotlib tutorials, I came across the notion spine, and then I
succeeded.
The second problem caused a major modification in my program! See the script
below:
1 # -*- coding: utf-8 -*-
2 from __future__ import unicode_literals
3 import numpy as np
4 import matplotlib
5 matplotlib.rcParams['text.usetex'] = True
6 matplotlib.rcParams['text.latex.unicode'] = True
7 import matplotlib.pyplot as plt
8
9 t = np.arange(0.0, 1.001, 0.001)
10 y = np.cos(16.0 * np.pi * t) * np.exp(-t / 0.4)
11
12 mm = 0.1/2.54
13 fig = plt.figure(figsize=(200.0 * mm, 160.0 * mm))
14 ax = fig.add_axes([0.2, 0.2, 0.75, 0.75])
15 plt.setp(ax.spines.values(), linewidth=1.5)
16 ax.plot(t, y, color='black', linewidth=3)
17
18 ax.tick_params('x', labelcolor = 'white',
19 pad = 36, length=10, width=1.5)
20 ylim = ax.get_ylim()
21 xticks = ax.get_xticks()
22 for i in xticks:
23 plt.text(i,ylim[0] - (ylim[1]-ylim[0])/10.,
24 r'\text{'+str(i)+ r'}',
25 fontsize = 25, horizontalalignment='center')

38
Performing Science with Open Source Software Chapter 4 Paint it black!

... followed by analogical for y-ticks and then lines 15-20 of my previous script.
Yes, I found no other way but to change the complete font family for the tick labels
and the axis labels. Anyhow, to work with latex and unicode is more professional,
and once adopted, it will be useful in most future applications. For now it may be
sufficient to know, what lines to add, i.e. c.f. lines 2, 4, 5, and 6. But this has severe
consequences!
What was so elegant before: to set the axes properties with one line for both axes,
has now to be done individually for the x-axis and the y-axis. Shown in the scrip
excerpt above are the respective lines for changing the x-axis tick_params.
First of all we have to 'whiten' the original tick-labels, because we will overwrite
them with the new fonts. Hence keyword: labelcolor = 'white'. The
padding has to be modified. pad = 36 for the x-axis and pad = 50 for the
y-axis.
Next we have to 'type' each tick-label individually!
Now we are prepared for our first control flow statement: the for-statement.
Different to other languages, Python's for-statement iterates over the items of any
sequence (a list or a string), in the order that they appear in the sequence. In our
example in line 22 xticks is the sequence containing the x-tick positions, since in
the line before, we 'got' them from ax. These are the sequence items that are one
after another assigned to i. The indentation of the following lines (23-25) indicates
the block of statements, that will be performed one after another with the respective
item i, i.e. first with the first value of the sequence xticks, then with the next, and
so on.
Lines 23-25 are actually one code line. The first two parameters in plt.text()
are the x- and y-coordinates of the position, where the text should be placed. The
x-coordinate is obviously the tick-position given by xticks, i.e. 'i'. The y-coordinate
is calculated and thus derived from the limits of the y-data, which we got before in
line 20. The y-position for each x-tick label is '1 tenth of the y-data span' below y=0.
The third parameter defines the individual x-tick label. For this to happen, the value
'i' has to be changed into a string variable, and this again into the right text font.
'pah!'. Fontsize and alignment are self-explaining. Note: for the y-labels you should
use 'right' for the horizontal alignment.
Very cumbersome, indeed! But ...

39
Performing Science with Open Source Software Chapter 4 Paint it black!

4.8 Finally perfect!


Voilà , there is our final result:

Note, that the original figure had a width of 20 cm and was downscaled to fit into
this book's page.

40
Performing Science with Open Source Software Chapter 4 Paint it black!

...and here with the envisioned downscaling to a single text column width of app. 8
cm.

41
Performing Science with Open Source Software Chapter 4 Paint it black!

4.9 Do it yourself!
Learning by doing is the best way to progress, even if you may stumble from pitfall
to pitfall. Inspect the matplotlib gallery for figures, you might want to adopt and
modify, and try to produce your individual plots e.g. by consulting the matplotlib
user guide and/or with a little help from your friends. Good luck!

42
Performing Science with Open Source Software

Dress to impress!

mayavi

43
Performing Science with Open Source Software Chapter 5 Dress to impress!

Chapter 5
Dress to impress!
5.1 Aim: Color, 3D, interaction, and animation
After this very conservative black and white compulsory imaging, you might be
eager to color up your life. Dress to impress will be our impetus now. This is one of
the strengths of Python(xy). It offers a vast number of tools, which allow
breathtaking graphics, animations and interactions. This book will just be able to
offer a glimpse on the possibilities. Nevertheless, -for sure- this will motivate you, to
creep deeper into this brave new world.

5.2 Searching for an example


Often, a good start is to look for an example and try to cannibalize. I found a nice
one from 'ETS'. ETS stands for 'Enthought Tool Suite', and this is just another very
extensive toolbox. But how to find it?
At this stage I would like to make you aware of two sources of Python codes, which
are already stored on your computer, after you downloaded Python(xy). However, as
time goes by, things might have changed (pathways, files, names, versions, ...). So
be prepared for unpleasant surprises. Then you have to become creative and find
your own ways, alas.
One of these sources contains the programs, which make up the software of all the
Python(xy)-packages, like for MatPlotLib, NumPy, VTK, etc. They are most
probably on your system disc 'C:' under 'Python27' (or a more up-to-date version):

44
Performing Science with Open Source Software Chapter 5 Dress to impress!

e.g. following this path:


system(C:)>Phython27>Lib>site-packages>enthought>mayavi>sources>
you may find the program 'array_source.py'. Usually you do not need to look for
these programs other than for curiosity. ...and you should not change them, except
you found a bug and want to fix it! Because these are programs used internally by
the toolbox.
The examples we are looking for are hidden in a separate area on your system(C:)
disc under programs(x86) (or more up-to-date folder name). Look for 'pythonxy' and
then for 'doc'. Under 'Libraries' you may find many other examples, but for now we
are interested in the folder 'Enthought Tool Suite'. Somewhere in this folder, there
should be a sub-sub-sub-folder with example codes. Try to find the file
array_animation.py . An alternative way to find it might be to use the search
function of your computer.

If you have found this file, just double-click it and after a while -depending on the
power of your computer- the performance will proceed: an animated 3D-graph of a
function will appear and after it resumes you will be able to tilt, rotate and scale this
virtual 'hat' interactively with respective mouse handling. Try it!

45
Performing Science with Open Source Software Chapter 5 Dress to impress!

5.3 Analyze this file


Now that you know the path to this file, keep it in mind. Open 'Spyder' and use the
'File explorer' to find this file with it. Note: Take care that 'Global working directory'
is activated in the toolbar and the respective directory chosen.
Double-click the file and the script will appear in the 'Editor'.
Probably you will have tried immediately to run this program and most probably you
may have achieved an error message. This was not the case, when I started this
program some time ago. But when I tried it again just now I received the a message
in the interpreter window shown next page.
These error messages are the most annoying part of programming. When pressing
the 'Run' knob you hope the job is done and then ... Especially when a program did
work properly before and a few weeks later not anymore. Let's see what happened in
this special case.

46
Performing Science with Open Source Software Chapter 5 Dress to impress!

Most confusing for a beginner are the many lines of error comments. As a rule of
thumb: important are often only the first and last lines. In this case the last blue line
indicates that the program execution stopped in module __init__.py in line 17.
By double-clicking the blue line, which is actually a link to the respective module,
the script of that module appears in the editor window as shown in above figure. line
17 contains the string sip.setapi('QString', 2). Now, what went wrong?
As already mentioned: the last red line ValueError: API 'QString' has
already been set to version 1 is the essential error message.
Well, I have to confess, I did not understand the full meaning of this message, but I
concluded that I have to replace the '2' in line 17 - and consequently in line 18 as
well - by a '1'. Then I did run this modified module __init__.py. By doing this, the
module is automatically saved and the original module is overwritten! Then I started
animate_array.py again and: success! I was lucky. Such maneuvers could as well
lead to disaster. Later I found out by discussing this point with the Python(xy)
Discussion Group
(see: http://code.google.com/p/pythonxy/wiki/Support), that the new version of
'Spyder' caused this mix-up of QString versions. Well, all this might not happen
nowadays anymore, but for sure similar annoying incidents. I showed this one to
give some hints how to help yourself.
But let us turn to the program itself:

47
Performing Science with Open Source Software Chapter 5 Dress to impress!

1 #!/usr/bin/env python
2 """A simple example showing an animation. The example
3 illustrates a few things.
4 1. You can pass a numpy array of scalars and use it directly
5 with tvtk.
6 2. The tvtk arrays are views of numpy arrays. Thus changing
7 the array in-place will also change the underlying
8 VTK data.
9 3. When changing the numpy data you must call `modified` on
10 a relevant tvtk object.
11 The example is a little contrived since there are better
12 ways of achieving the same effect but the present form nicely
13 illustrates item 2 mentioned above.
14 """
15 # Author: Prabhu Ramachandran <prabhu_r@users.sf.net>
16 # Copyright (c) 2004-2007, Enthought, Inc.
17 # License: BSD Style.
18
19 from tvtk.api import tvtk
20 import numpy
21 import time
22
23 # First create a structured points data set.
24 sp = tvtk.StructuredPoints(origin=(-10., -10., 0.0),
25 dimensions=(80, 80, 1),
26 spacing=(0.25, 0.25, 0.0))
27
28 # Create some nice data at these points.
29 x = numpy.arange(-10., 10., 0.25)
30 y = x
31 r = numpy.sqrt(x[:,None]**2+y**2)
32 # We need the transpose so the data is as per VTK's expected
33 # format where X coords vary fastest, Y next and then Z.
34 try:
35 import scipy.special
36 z = numpy.reshape(numpy.transpose(5.0*scipy.special.
37 j0(r)), (-1,) )
38 except ImportError:
39 z = numpy.reshape(numpy.transpose(5.0*numpy.sin(r)/r),
40 (-1,) )
.
.

We will analyze only these first few lines. Everything that follows will initiate the
visualization and animation, topics we will come to in later chapters. Here we just
exploit them for our purposes.
This example is a good archetype, how to write a program script. First, it introduces
and describes, what the program is made for. Then the author offers his name and
e-mail address and mentions copyright and license conditions. Also throughout the
script, helpful comments are inserted. In addition the program is well-structured and
easy to understand and easy to be altered for other functions.

48
Performing Science with Open Source Software Chapter 5 Dress to impress!

But first let us try to understand the program itself before we modify it. The actual
program starts in lines 18-20 (I am referring to above figure, the line enumeration in
your editor may be different) by importing the modules needed for our task: tvtk,
numpy, and time,. Then a data set of 80 * 80 points in a plane in 3D-space is
defined with a coordinate origin at x-, y-, z-coordinates at -10.0, -10.0, 0.0
respectively. The spacing between points is 0.25. Hence, we may conclude that the
plane will reach from -10.0 to 80 * 0.25 = +10.0 in xy-directions. So far, nearly
self-explaining.
Now we turn our attention to line 28, which I would like to explain in more detail. It
will be more convenient to analyze this line interactively in the IPython window.
You do not need to retype the program lines in that window. Just past and copy lines
18 to 28 from the editor to the interpreter. Now the last line in your interpreter
should be line 28 from the editor but with a different line number in the interpreter
count.
Next type 'x', in order to find out what has been assigned to 'x' by
numpy.arange(-10., 10., 0.25).
In [10]: x = numpy.arange(-10., 10., 0.25)

In [11]: x
Out[11]:
array([-10. , -9.75, -9.5 , -9.25, -9. , -8.75, -8.5 ,
-8.25,
-8. , -7.75, -7.5 , -7.25, -7. , -6.75, -6.5 ,
-6.25,
-6. , -5.75, -5.5 , -5.25, -5. , -4.75, -4.5 ,
-4.25,
-4. , -3.75, -3.5 , -3.25, -3. , -2.75, -2.5 ,
-2.25,
-2. , -1.75, -1.5 , -1.25, -1. , -0.75, -0.5 ,
-0.25,
0. , 0.25, 0.5 , 0.75, 1. , 1.25, 1.5 ,
1.75,
2. , 2.25, 2.5 , 2.75, 3. , 3.25, 3.5 ,
3.75,
4. , 4.25, 4.5 , 4.75, 5. , 5.25, 5.5 ,
5.75,
6. , 6.25, 6.5 , 6.75, 7. , 7.25, 7.5 ,
7.75,
8. , 8.25, 8.5 , 8.75, 9. , 9.25, 9.5 ,
9.75])

In [12]: x[0], x[4], x[80], x[-1]


------------------------------------------------------------
Traceback (most recent call last):
File "<ipython console>", line 1, in <module>
IndexError: index out of bounds

In [13]: x[0], x[4], x[79], x[-1]


Out[13]: (-10.0, -9.0, 9.75, 9.75)

49
Performing Science with Open Source Software Chapter 5 Dress to impress!

What we learn is, that 'x' has been assigned to represent a linear array spanning
from -10.0 to 9.75, contrary to our first assumption. Consequently input [12] in
above print out lead to an error message, because 'x[80]' does not exist. If one
defines 80 values and the index starts with '0' then the last index will be '79'. Hence,
input [13] gives a valid output, showing the values of the array 'x' with the
indices 0, 4, 79, and -1. Index -1 is the convenient equivalent for the last index of an
array.
This in turn suggests, that we should write x = numpy.arange(-10.,
10.25, 0.25) and change the dimensions of the point data set to 81 by 81, if we
want to obtain a truly symmetric picture.

5.4 Some tricks with arrays


Back to the editor: line 29 only declares, that 'y' should be the same array than
'x'. More exciting is the next line. We may ask: Why
r = numpy.sqrt(x[:,None]**2+y**2)
and not
r = numpy.sqrt(x**2+y**2)?
Let us try again to solve this question with the interpreter:
In [11]: x = numpy.arange(-10., 10.25, 0.25)

In [12]: y = x

In [13]: r = numpy.sqrt(x**2+y**2)

In [14]: r
Out[14]:
array([ 14.14213562, 13.78858223, 13.43502884, 13.08147545,
12.72792206, 12.37436867, 12.02081528, 11.66726189,
11.3137085 , 10.96015511, 10.60660172, 10.25304833,
9.89949494, 9.54594155, 9.19238816, 8.83883476,
8.48528137, 8.13172798, 7.77817459, 7.4246212 ,
7.07106781, 6.71751442, 6.36396103, 6.01040764,
5.65685425, 5.30330086, 4.94974747, 4.59619408,
4.24264069, 3.8890873 , 3.53553391, 3.18198052,
2.82842712, 2.47487373, 2.12132034, 1.76776695,
1.41421356, 1.06066017, 0.70710678, 0.35355339,
0. , 0.35355339, 0.70710678, 1.06066017,
1.41421356, 1.76776695, 2.12132034, 2.47487373,
2.82842712, 3.18198052, 3.53553391, 3.8890873 ,
4.24264069, 4.59619408, 4.94974747, 5.30330086,
5.65685425, 6.01040764, 6.36396103, 6.71751442,
7.07106781, 7.4246212 , 7.77817459, 8.13172798,
8.48528137, 8.83883476, 9.19238816, 9.54594155,
9.89949494, 10.25304833, 10.60660172, 10.96015511,
11.3137085 , 11.66726189, 12.02081528, 12.37436867,
12.72792206, 13.08147545, 13.43502884, 13.78858223,
14.14213562])

50
Performing Science with Open Source Software Chapter 5 Dress to impress!

In [15]: size(r)
Out[15]: 81

In [16]: r = numpy.sqrt(x[:,None]**2+y**2)

In [17]: size(r)
Out[17]: 6561

In [18]: r
Out[18]:
array([[ 14.14213562, 13.96647772, 13.79311422, , 13.79311422,
13.96647772, 14.14213562],
[ 13.96647772, 13.78858223, 13.61295339, , 13.61295339,
13.78858223, 13.96647772],
[ 13.79311422, 13.61295339, 13.43502884, , 13.43502884,
13.61295339, 13.79311422],
...,
[ 13.79311422, 13.61295339, 13.43502884, , 13.43502884,
13.61295339, 13.79311422],
[ 13.96647772, 13.78858223, 13.61295339, , 13.61295339,
13.78858223, 13.96647772],
[ 14.14213562, 13.96647772, 13.79311422, , 13.79311422,
13.96647772, 14.14213562]])

As shown by the interpreter outputs the first expression for the root of the sum of
squares gives a linear array, i.e. a vector with 81 elements. The expression applies
x[0] and y[0] to yield r[0], x[1] and y[1] to yield r[1], and so on.
However, the second expression with x[:,None] instead of x yields a two
dimensional array with 81 * 81 elements! The nice trick here is: x[:,None] is the
transpose of x. Try for yourself with the interpreter. Thus, the array multiplication
yields a square array, i.e. the tiny letter 'r' represents a huge matrix!
In conclusion: with only three lines of code (lines 29-31) we have provided for 81 *
81 Points of an xy-plane the euclidian distances. With such a construct we have a
very convenient means to visualize rotationally symmetric functions in quasi-3D as
we will see just now.
Lines 36 and 39 obviously describe two different rotationally symmetric functions.
Before we try them, why not commenting the respective lines with '#' and replacing
them by a function of our own, e.g. the simple z = f(r)? cf.:
#try:
# import scipy.special
# z = numpy.reshape(numpy.transpose(5.0*scipy.special.j0(r)),
(-1,) )
#except ImportError:
# z = numpy.reshape(numpy.transpose(5.0*numpy.sin(r)/r),(-1,))

z = r

Run this modified program and ... as suspected: a cryptic error message!

51
Performing Science with Open Source Software Chapter 5 Dress to impress!

Well, let us try the function in line 39 instead:


#try:
# import scipy.special
# z = numpy.reshape(numpy.transpose(5.0*scipy.special.j0(r)),
(-1,) )
#except ImportError:
# z = numpy.reshape(numpy.transpose(5.0*numpy.sin(r)/r),(-1,))

#z = r

z = numpy.reshape(numpy.transpose(5.0*numpy.sin(r)/r), (-1,) )

Here we nearly succeeded! Only a warning appeared: RuntimeWarning:


invalid value encountered in divider indicating what we see in the
displayed figure: a "hole" at the origin of the function due to the undefined
expression sin(0)/0. We can fix it by adding the line z[len(z)/2 + 40] =
5.0
Coming back to our problem with z = r. After carefully comparing the lines we
should come to the conclusion to replace z = r by z = numpy.reshape(r,
-1). Now we succeed. Try to find out the difference of both assignments for z.

52
Performing Science with Open Source Software Chapter 5 Dress to impress!

Finally, let us come back to the function in line 35: scipy.special.j0(r).


"Ask" your interpreter what scipy.special.j0?. means and you will find out,
that it returns the Bessel function of order 0. More details are given in the Scipy
documentation.
By the way, we became aware of a convenient Python statement
try:
...
except ImportError:
...

In other words: during execution of the program the statements of the first block are
"tried" and if the error message of the type specified after "except" (in this case an
Import Error) happens, then the statements of the second block will be executed
instead. In our case: if the Bessel function could not be "found" during execution
then the sinc function would have been selected by the run.

53
Performing Science with Open Source Software Chapter 5 Dress to impress!

5.5 Do it yourself
For now you have learned, how to produce a colorful pseudo-3D image of a
rotationally symmetric function of your choice. You may ask, whether this works for
non-rotationally symmetric functions as well. Yes, quite similar. Try it by yourself.
An example is given below: the lateral distribution of the values of the z-component
of the magnetic field of a current dipole:

54
Performing Science with Open Source Software

Bake your own .py!

Your own py-package!

55
Performing Science with Open Source Software Chapter 6 Bake your own .py!

Chapter 6
Bake your own .py!
6.1 Aim: creating a package with your own programs
Are you a bit frustrated now, after working through the fourth chapter? All this
typing just for a simple figure! Well, this is the unpleasant part of programming. The
more rewarding part is: once you have created a useful script, you may use this again
and again, just by changing a few arguments.
Last time you typed all those code lines, next time, when you have other data or a
new function to display as a new figure, then you would like to snip your fingers and
call "figure!", wouldn't you?, and the new image should be ready. You can come
close to it. But before, you have to bake your own .py first.

6.2 Step by step, starting simple


Let us start the way we know, with the plot of a sine function:

56
Performing Science with Open Source Software Chapter 6 Bake your own .py!

When we use the standard matplotlib.pylab plot function, then the default settings
come into action and the blue sine plot results. If we want a red sine function, we
may override the default color (c.f. Figure 2 above).
But if we would like to modify a lot of figure properties, as e.g. in the last chapter,
one way would be to generate a completely new matplotlibrc file according to
your own taste as is described in the matplotlib documentation (User’s Guide,
Chapter "Customizing matplotlib").
However, this is of little help, if you need for different purposes different individual
"standard" figure layouts for instance for a scientific journal, for a poster, for
presentations, you name it. Then it is more convenient, to prepare a respective
number of program variants, e.g. my_APL_fig.py, my_poster_fig.py,
my_talk_fig.py, and replace in code line 6 matplotlib.pylab by e.g.
my_simple_figure.
In order to learn, how such programs, in the following termed "modules", could be
structured and how they interact with your calling program, I prepared a few
examples and start with a very simple my_simple_figure.py module.

57
Performing Science with Open Source Software Chapter 6 Bake your own .py!

Please compare this to the desktop of the preceding page and recognize the
similarities and the differences. Let us start with the side effects first. For displaying
the figures in the first instance I worked with the Python console in the second
desktop with the IPython console. Both -as often in life- have their pros and cons.
Once you get external independent figure panels, which you sometimes have to find
behind the Spyder window, but which you can edit in size etc. However you have to
cancel them manually before continuing. With the IPython alternative the figures
show up directly on the Spyder IPython console.
Another trick utilized above: when clicking the right mouse button on the top rim of
the Spyder Editor pane, then -among other choices- one can split the pane. A very
convenient feature, if you want to inspect and work with two scripts that relate to
each other. For instance an old and a new version of a program or -like in our case
here- to compare the programs which interact with each other.

This brings us to our main point: how does my_sine.py -the "calling" program-
interact with my_simple_figure.py? -the "service" module.
Let's recall, what our aim was: we wanted to replace in the old calling program line
6 import matplotlib.pylab as sf
by import my_simple_figure as sf in the new calling program in order
to implement our own default figure properties. In this very simplified example the
default line color becomes green and the default line width becomes 6 dpi. ... and
-with line 11 in the calling program- it works!!
But ... to override the default with line 12 -as we could manage with matplotlib-
causes here an error message. What happened?

58
Performing Science with Open Source Software Chapter 6 Bake your own .py!

Well, when calling sf.plot we refer to the function def plot(x,y) of lines
7-9 in the service program and this is not the same as the plot function of
matplotlib.pylab (this is why in line 8 of the service program we reference to
plt.plot(...)).
The error message indicates, that for sf.plot() no keywords are defined, ... yet.
No problem, we may alter the plot function in my_simple_figure.py as
follows and this topic is solved:
7 def plot(x, y, color = 'g'):
8 plt.plot(x, y, color, lw=6)
9 plt.show()

You may have a small rest, if you are confused now and recapitulate the different
dependencies: When we write sf.plot, then sf is the alias for the
my_simple_figure.py module as assigned by line 6 in my_sine.py. The
appendix .plot refers then to the plot method defined in the
my_simple_figure.py module, i.e. to line 7 there. Whereas plt.plot in the
proceeding line 8 refers to the plot method of plt, i.e. to matplotlib.pyplot.
I could have made life easier for you, if I would have chosen in line 7 in
my_simple_figure.py instead of plot the name my_plot for the function
and consequently sf.my_plot in lines 11 and 12 in my_simple_figure.py.
However, things like this happen quite frequently, that the same name is used for
different objects and one has to take care to discriminate correctly and to understand,
in what "name spaces" a name is valid.
By the way, for those who learn Python on the fly in this book: we have just been
introduced to the convention of defining a function: the first line starts with the
Python keyword def followed by the function name, which you may choose,
further followed by brackets including parameters and keyword arguments
("kwargs") and finalized by a double point (c.f. line 7 in our example above). The
next lines describe, what the function is intended to do and they are all indented by a
few spaces. The function definition ends, when the line indentation ends. Here the
two tools and in the Spyder toolbar are very convenient for indenting or
unindenting multiple selected lines at once.

6.3 Class with style!


Now, that we learned what a function is, we may progress to the concept of a "
class", since until now our homemade "simple figure" program was far too
primitive to be useful. Again, we continue with a rather simplified and more
"pedagogic" variant of my_simple_figure_0.py with a respective variant of
my_sine_0.py as a calling function. In this way we dive deeper into the merits of
"object oriented programming" (OOP). -oops!

59
Performing Science with Open Source Software Chapter 6 Bake your own .py!

On this desktop we splitted the Editor pane horizontally. Another trick: in lines 18
and 23 we used the backlash "\" as a means for line continuation, in order to
facilitate a slim pane. But let us concentrate on the script codes.
Start considering my_simple_figure_0.py as a black box and inspect what
that module does in my_sine_0.py. What jumps into one's eyes: My_Figure is
called twice and two different names are assigned to it: in line 13 f1 and in line 17
f2. We created two different "objects", which have different properties and different
abilities to perform methods, although they stem from the same "black box" module
My_Figure. Properties and abilities of objects will be called "attributes" and these
attributes are referred to, when writing e.g. f1.lc, i.e. "object(f1)"."attribute(lc)".
Similarly f1.plot(...) refers to the attribute of f1 to have the ability to plot,
i.e. it refers to the plot function defined in the black box module.
My_Figure is a kind of inactive template in which properties, and abilities and
methods for objects are written down. Such objects come to life by "initiation" and
assignment to an object name. The notation to create an object can be seen in lines
13 and 17 (note the brackets!). The attributes may be changed as done in line 18 and
they remain valid until the end of the program, if not deleted intentionally. Thus the
second time f2.plot(xy) is called, a repeated initiation of f2 is not needed.
When you run the program my_sine_0.py, it performs to plot trice: first a black
colored sine function of amplitude 1 using f1, second a red colored sine function of
amplitude 0.8 and finally a red colored inversion of the latter sine function both
drawn by f2 (c.f. the figure in the IP:console).
At this stage it may not be obvious, what the advantages of this approach are,
compared to the method introduced in the previous subchapter. But in practice, the
modules and classes are much more complex and contain lots of attributes.

60
Performing Science with Open Source Software Chapter 6 Bake your own .py!

When initiating the objects in a calling program, one then picks and modifies only
those attributes one needs in the respective circumstances. Soon you will appreciate
this performance, when our programs become more advanced.
Now let us turn to the ominous black box, the module
my_simple_figure_0.py and ignore for a while the last four lines in that
script. What is new for us is the class which is characterized by the headline
class "name of the class": followed by an indented block indicating the
length of the class. The class itself contains usually several statements and/or method
definitions (functions). In our very simple case only two functions. Before we
inspect them, this omnipresent and dubious self should be explained. This is a
convention of Python programs. You may consider that it stands here for the name
My_Figure itself, or, when called from my_sine_0.py, either for f1 or f2.
The first function __init__ relates to another Python convention: it is a "built
in"-function and what it does, it initiates the object creation. In other words: when
e.g. in my_sine_0.py the object f1 is initialized (line 13), then one could say
that in line 10 of My_Figure "self" "means" f1 and is now initiated. Not only
that, but -as lines 11 and 12 declare- also the two default attributes for f1 are
defined: in our case the line color "black" and the linewidth "3.0". In your back head
you may replace self by f1 or f2 respectively and memorize "f1.lc = 'k'" or
"f2.lnw = 3.0", etc.. You may recall, that in my_sine_0.py the attribute of f2
was changed to red in line 18.
The definition of the plot-function looks a bit different compared to the one in the
preceding subchapter. Inside the brackets again this dubious self shows up,
stating "This is the plot function attribute for whatever object is initiated from this
template" -if it could speak-. Note, that only x and y are the remaining parameters
here. The namespace of these is only valid within this function. For now, we ignore
lines 15-19. Line 20 calls the plot function of plt and here the arguments for
line color and linewidth are listed in two different conventions for didactical reasons.
Note, that the values declared in lines 11 and 12 are inserted here.
So far so good. If you save (not run!) my_simple_figure_0.py now and then
save and(!) run my_sine_0.py you will get the desired multi-sine figure.
Did you try to run my_simple_figure_0.py as well? You may have been
surprised. Because you now activated the code we ignored until now. In a sense, this
program exhibits a Janus head: when it is called by another program, then it serves
as described above. Nevertheless, it can also act as a stand-alone program, or in
other words: as a '__main__'- program.

61
Performing Science with Open Source Software Chapter 6 Bake your own .py!

The decision, which face of the Janus head should be applied, is performed by code
line 23. If the program is called by another program, then it is not the main program
and the answer to line 23 is "No!", more precise a boolean FALSE. If
my_simple_figure_0.py is run as a stand-alone program, then the answer is
"Yes!", or better: a boolean TRUE, and the lines 24-26 become active.
Here the class My_Figure is assigned to the name fig and then the attribute
fig.plot is activated. Now, self in the class stands for fig and x, y are five
value pairs. Thus, line 15 becomes TRUE and the definition of marker symbols
come into play producing the respective graph.

The main advantage of this "if __name__ == '__main__':"-trick is, that


when developing a program, which is intended to be a service program, one can test
most of its functions immediately without the need to run the programs, which shall
call it later.

6.4 Remote control


When you are getting more and more experienced, you will generate more and more
folders for various projects with programs, which should call your
my_simple_figure_0-program. Until now this would work only reliably, if
my_simple_figure_0 is in the same folder then the calling program. A better
solution would be, if you create a package of all your custom-made service
programs. You could call this package my_py or choose any other name, and
locate it e.g. in the site-packages folder of python(xy).

62
Performing Science with Open Source Software Chapter 6 Bake your own .py!

I do not know, how safe it is there when you update python(xy) to a new version. So
you better save a backup of this folder now and then.
Inside your my_py -folder will be your collection of custom made service
programs, at present only my_simple_figure_0 and an __init__ -file. The
latter may even be empty, i.e. not contain any code. However, one could of course
add some defaults here.

Then you just have to replace line 8 in my_sine_0.py by


from my_py import my_simple_fig_0 as mf
and in lines 13 and 17 replace My_Figure() by mf.My_Figure().
Now you have established the "remote control" for providing your individual figure
drawing program!

63
Performing Science with Open Source Software Chapter 6 Bake your own .py!

6.5 Do it yourself!
Now it is again your turn. I called my_simple_figure_0 my_ simple_figure
because it is simple, far too simple... For general purposes, you would need a much
more elaborate program, which can deal with many different attributes. It should
provide quite a number of default properties, which on the other hand should easily
be overrideable, if desired. As a start, you could begin as follows:
1 import matplotlib.pyplot as plt
2
3 class My_Figure:
4 def __init__(self): # Initialize "My_Figure"
5 self.lc = 'k' # with some defaults
6 self.lw = 3.0
7 self.q = False
8
9 def questions(self, qs):
10 self.q = qs
11 if self.q == True:
12 self.lc = raw_input("line_color = ")
13 self.lw = raw_input("line_width = ")
14
15 def line_width(self, lw=1.0):
16 self.lw = lw
17
18 def line_color(self, lc = 'g'):
19 self.lc = lc
20
21 def plot(self, x, y):
22 if len(x) <= 20: # for few samples -->
23 self.lc = self.lc + 'o' # filled symbol "o"
24 plt.plot(x, y, self.lc, lw=self.lw)
25
26 if __name__ == '__main__':
27 fig = My_Figure()
28 fig.lc = 'r'
29 fig.line_width(lw=6.0)
30 fig.plot([0,1,2,3,4],[-0.5,-0.25,0.0,0.25,0.5])

Here a few variants are incorporated whose features become clear, when playing a
bit with the following calling program:
1 import numpy as np
2 from my_py import my_simple_fig_1 as mf
3
4 x = np.linspace(0.0,2*np.pi,200)
5 y = np.sin(x)
6
7 fig = mf.My_Figure() # initializes "My_figure"
8 fig.questions(True)
9 #if omitted, then the defaults of My-Figure are applied
10 #fig.lc = 'r' # or you override the default color
11 #fig.line_color('b') # or you call the function to change
12 # the color
13 fig.plot(x,y)

64
Performing Science with Open Source Software Chapter 6 Bake your own .py!

The most novel feature is the interactive communication between you and the
program. If you run the calling program, then questions pop up in the Spyder
console, asking for the line color and the line width:

Alternative ways to override the defaults are given in lines 9 and 10. Uncomment
them for a try and analyze how the two programs interact with each other.
I have produced my own general purpose my_figure.py and will use it in all the
next chapters. You should edit your own, because you will have different tastes and
requirements than me. This will be a long-term experience for you because
-inevitably- you cannot know, what purposes your program should meet in future
and inevitably, your program will produce many error messages at the beginning,
because you did not consider this or that.
One difficulty is to foresee the many variants for the scaling of the axes. The number
of optimal ticks, scientific notion for very small or very large values, negative
values, maximum and minimum values of the scales, etc. will demand clever
programming in order to achieve nice and professional looking figures.
Try your very best!

65
Performing Science with Open Source Software Chapter 6 Bake your own .py!

Notes:

66
Performing Science with Open Source Software

Will you still feed me?

Physionet

67
Performing Science with Open Source Software Chapter 7 Will you still feed me?

Chapter 7
Will you still feed me?
7.1 Aim: make external data available for your
programs
Just for fun, try this small script:
1 import urllib
2 opener = urllib.FancyURLopener({})
3 f = opener.open(
4 "http://www.azlyrics.com/lyrics/beatles/
5 whenimsixtyfour.html")
6 text = f.read()
7 i = text.find('feed')
8 print text[i-40:i+7]
9 print text[i+14:i+34]

This will produce the popular refrain of the famous song of the Beatles:
Will you still need me, will you still feed me,
When I'm sixty-four?
Well, "feeding" your programs with data from "outside" is an essential ingredient of
programming. The small example above showed, how to import a text fragment
from a web page, and below we will learn, how to enter data from an Excel sheet,
pixel values of an image, and ECG-signals from a public database.

7.2 Point by point


In practice, drawing a figure of a function does not happen as frequently. More often
we will need to read data points from a file and display them in a suitable manner.
As an example let us read data from an Excel-file and make a drawing point by
point. If you want to deal with the same values than in this example, then you may
copy the sheet displayed below. But of course, you can try the following with your
own Excel-file.
There are special programs in the python(xy) site-packages, which are designed to
handle Excel files in a more professional manner: "xlrd" and xlwt". As a quick start
and in order to learn some Python features, you may continue as follows:
A convenient way is to change the '.xls'/'.xlsx'-file to the 'comma separated value-'
('.csv-') format.

68
Performing Science with Open Source Software Chapter 7 Will you still feed me?

This is done by opening the '.xls'/'.xlxy'-file and saving it as file type '.csv'. Just
compare the contents of both file types for instance by opening the '.csv'-file with
Window's 'Editor' or 'Wordpad' using the option 'all files':

The values we are interested in are those in columns C and D starting from row 4. In
the '.csv'-file the values, which once belonged to different columns are now
separated by semicolons. Why not by commas, as the name 'comma separated
values' suggests? Well, this is due to the regional settings, in this case for Germany,
where the decimal point is commonly a comma, and thus another separator is needed
to avoid ambiguities. In other regions, the delimiter symbol may be different, e.g. the
comma.

69
Performing Science with Open Source Software Chapter 7 Will you still feed me?

Here is the code that cracks the '.csv'-file. Note the value chosen for the delimiter
';':
1 import csv
2 path = "C:/... your path.../"
3 rdr = csv.reader(open(path + "data.csv"), delimiter=';')
4 #x = []
5 #y = []
6 i = -1
7 for row in rdr:
8 i = i + 1
9 if ((i > 2) and (row[2] != '')):
10 print row
11 break
12 # x.append(float(row[2]))
13 # y.append(float(row[3]))
14 # print "%5.1f %5.1f" % (x[-1], y[-1])
15 # elif (i == 1):
16 # xl = row[2]
17 # yl = row[3]
18 # elif ((i > 2) and (row[2] == '')):
19 # break

First we import a module called csv. This contains a function named 'reader'.
Thus, in line 3, csv.reader()reads the rows of our '.csv'-file, which is opened by
the Python open()function. We should not forget the keyword 'delimiter' and
remark, which separator symbol our '.csv'-file utilizes. In this case: the semicolon.
Finally we assign the variable rdr with the content of csv.reader(). So far,
line 3.
For now let us ignore the commented lines and inspect the object reader in more
detail. With the for...in-statement we may extract one row after another from
reader. Since we are only interested in table values starting in the fourth row, a
counting index i is introduced and with the if-statement we induce, that the
print-statement gets involved not until the fourth row has its turn. Note, that
counting starts with '0', thus i = 3 (which is i > 2) is the count for the fourth row.
Hence, the output of the print-statement is:
['0.0', '0.0', '0.00', '0.00']

The break statement stops the execution, otherwise one row after the other would
be printed. After this check, let us now collect the x- and y-values, we are interested
in. For this, 'comment' lines 10 and 11 and 'uncomment' those lines, which have been
commented so far, i.e. lines 4, 5, and 12-19.
Lines 4 and 5 create empty lists for x and y. In lines 12 and 13 the respective values
are appended to the lists. Since in the '.csv'-tables the values are 'string'-variables, we
have to transform them to 'float'-variables. row[2] and row[3] are the x and y
values from the third and fourth columns (counting starts with 0!).

70
Performing Science with Open Source Software Chapter 7 Will you still feed me?

And because we want to know all x- and y-values we print these out, this time with
formatting instructions: f for floating-point variable, 5.1 for 5 placeholders and 1
decimal. [-1] indexes the last element in a list.
Now that we have collected the coordinates of all points, we are able to draw the
figure the way we learned in the last sub-chapter.

7.3 Pixel by pixel

71
Performing Science with Open Source Software Chapter 7 Will you still feed me?

With many image processing packages, you can manipulate pictures without the
necessity to know, how an image file is constructed nor to deal with pixels.
However, when you are intending to work with images more seriously, then you
should require an access to the values of the image pixels and have them available as
an array. Again, there are several ways to Rome. Here an elegant shortcut is
introduced, using the imread-function from the "Miscellaneous routines" of SciPy.
Check the SciPy Reference Guide for more details, which you can access via the
Spyder toolbar "Help" > "Installed Python Modules" > SciPy documentation.
The following program is written for a .tif-image, but would work for other
image types as well (with a bit different results). If you would even like to take the
same image of a histological cut of a left heart ventricular wall, then you just scan
the picture shown above and save it as test001.tif in TIFF-format.
Of course, your image size, resolution and pixel values will be slightly different.
1 import matplotlib.pyplot as plt
2 from scipy import misc
3 import numpy as np
4
5 path = "C:/... your path .../"
6 imarr = misc.imread(path + "test001.tif")
7 print np.shape(imarr)
8 print imarr[1000,1100,:]
9
10 line_profile = imarr[1400,:,:]
11 #line_profile = np.copy(imarr[1400,:,:])
12 x = np.arange(len(line_profile))/float(len(line_profile))
13
14 imarr[1400:1406,:,:] = [0.0,0.0,0.0]
15
16 plt.subplot(2, 1, 1)
17 plt.imshow(imarr[1200:1800,:,:])
18
19 plt.subplot(2, 1, 2)
20 plt.plot(x, line_profile[:,0], 'r')
21 plt.plot(x, line_profile[:,1], 'g')
22 plt.plot(x, line_profile[:,2], 'b')
23 plt.ylabel('values')
24
25 plt.show()
26 plt.savefig(path + "test002.tif", dpi = 1600)

The first three lines import the packages needed. misc is new for us and in line 6
you see the magic single code line, which transforms the .tif-file into a numpy
array named by us: imarr.

72
Performing Science with Open Source Software Chapter 7 Will you still feed me?

Do not continue with print imarr in order to find out, how imarr looks like.
It would generate an endless printout. Better, prefer print commands like in lines 7
and 8. Their output in the console panel
(1924, 2248, 3)
[237 66 134]

informs us on the shape of the array -it has the dimension 1924x2248x3- and on the
value of the pixel at row 1000 and column 1100. The "value" of the TIFF pixels is a
three-element array, consisting of the "intensities" of the three base colors red, green,
blue of the RGB color system, and by intuition, when looking at the image, we
would agree, that red is the dominant color component with a noticeable bluish scant
and very faint green share.
The next lines produce the image shown below. We want to extract the color profile
of the 1400th horizontal line (counted from top to bottom) of the image.
imarr[1400,:,:] means: all RGB-values of row 1400 (first row is row 0!). In
other words: the first element in the square brackets stands for row 1400, the second
element :stands for all column values from 0 to 2247, and the third element for the
three color components indexed from 0 to 2. This sub-array we name
line_profile. For a moment, we ignore the commented line 11. Line 12 is for
the x-axis, containing 2248 values, and scaled to 1.
Our figure consists of two subplots, one showing a part of our image with a black
line at row 1400 to indicate where the line profile shown in the second subplot has
been extracted. Since only the lower part of the original image is of interest here,
line 17 uses imarr[1200:1800,:,:], the "slab" of rows 1200 until 1800. There
appears the black line, because in codeline 14 we blackened all pixels in the slab
[1400:1406,:]. Black in RGB is [0,0,0].
In the lower subplot we draw the values for the red, green, and blue components of
the respective pixels in row 1400. At the beginning and at the end of the row, the
image is close to white, hence the intensities for red, green, and blue are close to the
maximum value 255. For most other pixels red tissue dominates, only in the middle
and at app. x = 0.82 whiter connective tissue cells are present and thus the blue and
green values spike here.
But ...
You might have received a different subplot than the one shown here, if you have
chosen line 10 instead of the uncommented line 11. What happened? Well, I have
explained that error already in chapter 2.8! In line 10 imarr and
line_profile are names pointing partly to the same memory locations. If at a
later stage -here in line 14- some pixels are changed to black, then those memory
cells store the values [0,0,0].

73
Performing Science with Open Source Software Chapter 7 Will you still feed me?

That means: not only imarr[1400:1406,:] is black but also


line_profile[:,:] became fully black as well. One can prevent this by
copying imarr[1400,:,:] as done in line 11. Then line_profile fills and
occupies other memory cells. With this change, you should be able to get the figure
shown here. But please note, that your figure is only saved, if you are working with
the Python console and not with the interactive IPython console.

7.4 Beat by beat


Databases can be wonderful grab bags -- if you can read the data. Often they are
encoded for efficient storage and you have to find out, how to decode. One of my
favorite public databases is www.physionet.org, and the following example is
selected from it.

74
Performing Science with Open Source Software Chapter 7 Will you still feed me?

Open page
http://www.physionet.org/physiobank/database/ptbdb/
and look for patient001:

Download the first two files and store them in a folder. If you try to open the .dat
or "data"-file e.g. with Editor or Wordpad, you will obtain a very cryptic output.
However, the .hea or "header"-file should be readable, since it is a text file. Here
you can find detailed information about the data, which you need, when decoding
the data file. In the Physionet, numerous help pages and description on programs for
dealing with the database are provided. It would take some days, to understand the
basics. It is only worthwhile to dive into this world, if you really need to work
extensively with these data files.

75
Performing Science with Open Source Software Chapter 7 Will you still feed me?

On the other hand, these are very typical examples, how data are stored in scientific
repositories. This is why I selected this sample. To make things short, I explain only
the most important parameters contained in the first readable lines of the header file:
s0010_re 15 1000 38400
s0010_re.dat 16 2000 16 0 0 0 0 i
s0010_re.dat 16 2000 16 0 0 0 0 ii
s0010_re.dat 16 2000 16 0 0 0 0 iii
...

First line: The file named "s0010_re contains signal traces from "15" ECG
electrodes ("leads") with a temporal resolution of "1000" samples per second and
38400 "samples" in total. Second line: "2000" is the scale of the signal amplitude,
and "i" is the notation of the lead. Hence, the next lines describe the next 14 leads
respectively.
With the following script, we extract these data from the header file, because we
want to have them at hand, when decoding the data file subsequently:
1 rd_file = open(your_path + 's0010_re.hea', 'r')
2 t1 = rd_file.readlines()
3 #print t1[0]
4 t2 = t1[0].split(' ')
5 t3 = t1[1].split(' ')
6
7 ch = int(t2[1])
8 print 'number of channels = ', ch
9 sps = int(t2[2])
10 print 'number of samples per second = ', sps
11 spl = int(t2[3])
12 print 'number of samples = ', spl
13 y_sc = float(t3[2])
14 print 'y-scale = ', y_sc
15
16 ch_names = []
17 for i in range(ch):
18 t4 = t1[i+1].split(' ')[-1]
19 ch_names.append(t4[0:-1])
20 print 'channel names', ch_names
21 rd_file.close

This script should be more or less self-explaining. If you do not know, what the
different commands do, then you should insert respective print commands,
similar to the one indicated in line 3, which clears up, what you have done in line 2.
In contrast, the program below, which we will use to read the ECG signal data from
the data file, is not at all self-explaining. Even though I successfully extracted the
signals in the end, I do not fully understand, how I succeeded. However, what
counts, is the result!

76
Performing Science with Open Source Software Chapter 7 Will you still feed me?

My approach was "step-by-step" and "try-and-error". If we would try to open the


's0010_re.dat'-file like the header file in lines 1 and 2 above, we would not acquire
the whole content. An alternative and more successful way is:
1 rd_file = open(your_path + 's0010_re.dat')
2
3 for line in rd_file:
4 print line[0:20]
5 print repr(line[0:10])
6 print ord(line[0]), ord(line[1]), ord(line[2]), \
7 ord(line[3]), ord(line[4])
8 break

With this for ... in ...:-loop one really reads out all contents of the file.
The break in line 7 hinders the loop to continue after the first file line. The three
print commands should give us an idea, about the encoding and the values of the
file content. In order to avoid too long printouts, only the first few characters are
chosen. print line[0:20] produces the cryptic output:

The repr()-(built_in-)function shows, what the cryptic characters represent:


'\x17\xfe6\xfe\x1f\x00\xda\x01\xfc\xfe'

The escape character \x tells the insiders, that we have to do it with hexadecimal
numbers. You may search the internet for hexadecimal-to-decimal converters and
you will find out, that a hexadecimal '17' is a decimal '23', and an hexadecimal 'fe' is
equivalent to a decimal '254', and so on. However, why there is suddenly a '6' I do
not understand. Anyhow, the ord()-function does the job of translation
automatically as shown by the output of line 6, but only one character after another.
23 254 54 254 31

This is why we have to add the following lines to our script, in order to collect all
entries in one list, the object data. Here we break the for-loop to restrict the
print output:
1 rd_file = open(your_path + 's0010_re.dat')
2 data = []
3 for line in rd_file:
4 i = -1
5 while (i <= (len(line)-2)):
6 i += 1
7 data.append(ord(line[i]))
8 break
9 print data

77
Performing Science with Open Source Software Chapter 7 Will you still feed me?

If we display these data with our standard figure commands, this will not help us:

A little bit clearer result we achieve with this attribute: fig.lc = 'k.':

This looks better, but not o.k.. The data points are 'clipped' above 255 and below 0,
an indication, that the available integers do not cover the whole signal amplitude
range. I'll spare for you all the next try-and-error-steps. There are so many possible
coding variants. The codecs-module of Python covers numerous of them (c.f.
C:\Python27\Lib\encodings). Here is the proper way to get to the signal data:

78
Performing Science with Open Source Software Chapter 7 Will you still feed me?

1 import codecs
2 import numpy as np
3
4 rd_file = codecs.open(your_path + 's0010_re.dat', \
5 encoding='utf-16-le')
6
7 data = []
8 ch = 12
9 for line in rd_file:
10 i = -1
11 while (i <= (len(line)-2)):
12 i += 1
13 data.append(np.int16(ord(line[i])))
14
15 d = np.array(data)
16 d = d.reshape(len(d)/ch, ch)

This looks better, but not nice. The data in the list data are arranged in such a way,
that the first signal value of channel 'i' is followed by the first value of channel 'ii',
which is followed by the first value of channel 'iii', etc.. Hence, if you want to extract
the whole signal train of one channel. e.g., channel 'v1', then it is convenient to
change the list data into a numpy array, here named d and rearrange - as in line
15 - the flat array of length len(d) into a 3200x12-array, since the data file
's0010_re.dat' contains only the 12 channels from 'i' to 'v6', while the
channels 'vx', 'vy', and 'vz', the orthogonal leads, are stored in file
's0010_re.xyz'. Now it is easy to display any of the twelve ECG signals, e.g.
d[:4000, 6] with the right units with the help of the parameters gained from the
header file:

79
Performing Science with Open Source Software Chapter 7 Will you still feed me?

7.5 Do it yourself!
This exercise is quite demanding, but try to face this challenge. Use the data of the
's0010_re.dat'-file in the preceding chapter and extract and compose out of the app.
50 heart beats an averaged beat. If you do this for every of the 12 ECG channels and
superimpose them to a so called "butterfly-plot", then you should achieve something
like the following figure. If not, you may have done something wrong.

80
Performing Science with Open Source Software

Like a Nobel laureate

NumPy

81
Performing Science with Open Source Software Chapter 8 Like a Nobel laureate

Chapter 8
Like a Nobel laureate
8.1 Aim: Solving a system of ordinary differential
equations
Since the title of this book promises to perform science, it is time to do so. We will
tackle two problems involving the solution of differential equations. First, we start
with a straightforward approach, then a more common, more up to date way to
perform the job.

8.2 First Example: The Hodgkin and Huxley equations


In 1963 A.L. Hodgkin and A.F. Huxley earned the Nobel prize for their famous
research on conduction and excitation actions in a giant squid nerve. The essentials
of their findings have been published in the historical paper in The Journal of
Physiology and it is still a pleasure to read it, since it is clearly written and explains
every aspect in broad detail. In this way, many of their thoughts and calculations can
be easily reproduced.

82
Performing Science with Open Source Software Chapter 8 Like a Nobel laureate

Central to their findings was the modelling of the action potential, i.e. the electric
nature of the universal signal that propagates through our nerves. Moreover, even the
electrocardiogram, which we investigated in the last chapter, is the result of a
summation of all action potentials of all heart muscle cells.
At their time access to a computer was out of reach. Therefor Huxley performed all
calculations step by step on the Brunswiga calculating machine shown on the title
page of the special celebration issue of the Journal of Physiology. A tedious job! For
that alone, they earn our respect. See for instance an excerpt from the user guide for
this calculator on the right where it is explained, how to calculate the sum of two
products. How easy things are nowadays! Let us recalculate their famous equations.
For this, check their original paper, try to understand the basics of membrane
currents and look for the summary of equations on pages 518f in:

Nerve cells in their simplest form are tubular structures, i.e. a lipid double layer
termed "membrane" acts as a tube wall that separates the extracellular from the
intracellular environment. Those environments are electrolytic liquids of different
concentrations of ionic species, mainly sodium, potassium, and chloride ions. Since
the concentrations and composition of the extra- and intracellular electrolytes differ
from each other, a voltage drop occurs over the membrane. If the membrane would
only consist of the insulating lipid double layer then only spurious leakage currents
would flow through the membrane. However, cell membranes carry in addition
numerous specialized protein configurations, so-called channels, which allow ion
currents to flow though selectively. I.e. channels exclusively for sodium, or
potassium, or chloride exist.

83
Performing Science with Open Source Software Chapter 8 Like a Nobel laureate

At the channel constriction voltage induced local conformity changes may activate
(open) or inactivate (close) the channels, which may be characterized by the two
states "A" and "I" and their relation is governed by the rate constants αr and β r :

αr and β r are voltage dependent and differ with ion species. We may define a gating
parameter "r", the ratio of the number of activated channels to the number of all
channels - and see later, what this is good for. The rate change is governed by the
differential equation given below as well:

With the latter equation we can calculate the temporal evolution of the gating
parameter from a completely inactivated state towards an equilibrium value r00 for
an arbitrary choice for α and β :

84
Performing Science with Open Source Software Chapter 8 Like a Nobel laureate

1 import matplotlib.pyplot as plt


2 import numpy as np
3
4 t = np.arange(500)
5 r = np.zeros(500)
6 alpha = 0.5
7 beta = 2.
8 dt = 0.001
9
10 for i in t[:499]:
11 r[i+1] = r[i] + (alpha * (1. - r[i]) - beta * r[i]) * dt
12
13 r00=alpha/(alpha+beta)
14 print 'roo =', r00
15
16 r_th=np.where((r > r00*(1.-1./np.e)))
17 tau=r_th[0][0]*dt
18 print 'tau = ', tau
19 print '1./(alpha + beta) = ', 1./(alpha + beta)

Do you recognize, that code line 11 is equivalent to the differential equation given
on the previous page? Here 'dr' is replaced by r[i+1]-r[i]. Hence, with code
lines 10 and 11 we calculate from a start value r[0] = 0 at time t[0] = 0 the
next value r[1] at time t[1] = t[0] + dt. ... and so on for the next 499
values. The result is given in the following figure:

Please note the three printouts (lines 14, 18, and 19):
r00 is the equilibrium value, i.e. when r[i+1]-r[i] --> 0, hence line 13.

85
Performing Science with Open Source Software Chapter 8 Like a Nobel laureate

The time constant τ is by definition the time, where the gating parameter passes the
threshold of r00*(1-1/e), where 'e' is "Euler's number". Important mathematical
constants like π or "e" can be found under numpy, i.e. as np.pi or np.e
respectively. In line 16, a very convenient numpy function (np.where) is used to
determine this threshold. The first element of r_th, i.e. r_th[0][0] is this
value, and τ results by multiplying with dt.
Line 19 proves, that τ can be expressed with the help of α and β as well.
However, life is not as simple. Experiments of Huxley and Hodgkin ("HH") showed
for potassium and sodium quite different and unique behaviors of the respective
channel conductance changes. c.f. the respective figures below copied from their
above-mentioned paper; left for potassium and right for sodium. From those results
they concluded, that one has to do it with multi-step processes, i.e. for potassium a
four-step activation and for sodium a three-step activation combined with one
inactivation step.

Thus, we obtain curves, that are more realistic, if we draw 'r4' over 't' for potassium.
Actually HH replaced in this case 'r' by 'n', i.e. 'n4' (c.f. equations (6) and (7) in their
paper). This they did in order to distinguish from the activation ('m') and inactivation
('h') gating parameters for sodium. Consequently the sodium conductance curves
may be obtained by drawing 'm3*h' over 't' (c.f. equations (14)-(16)). See the
resulting curves below:

86
Performing Science with Open Source Software Chapter 8 Like a Nobel laureate

87
Performing Science with Open Source Software Chapter 8 Like a Nobel laureate

8.3 Putting everything together


After all these preliminaries, let us finally solve the coupled equations to obtain the
action potential signal. Starting point is the electric circuit model for the cell
membrane shown as Fig. 1 in HH's paper:

The membrane between the extracellular and intracellular domain is modelled as a


capacitance 'CM'. Three ion currents, the sodium, potassium, and leakage currents
'INa', 'IK', 'Il' are driven by their respective equilibrium potentials 'ENa', 'EK', 'El' via
voltage(!)- and time(!)-dependent conductances 'gNa', 'gK', 'gl'). Thus, the following
terms make up the total current flow:

This is essentially equation (26) on page 518 of HH's paper. Also on this and the
next page, they collected all other equations, which form the basis of our following
calculation (which HH did on their mechanical calculator machine!!). Compare the
code below. You will recognize changes for some of the prefactors. This is due to
considering more recent literature values.

88
Performing Science with Open Source Software Chapter 8 Like a Nobel laureate

1 import numpy as np
2 import matplotlib.pyplot as plt
3
4 dt = 0.01 # elementary time step 'dt
5 t = np.arange(0,2500,1) # variable arrays (time,
6 V = np.zeros( 2500 ) # voltage,
7 mhn = np.zeros( (3,2500) ) # gating parameters)
8
9 V[0] = -70. # start values
10 mhn[:,0] = ([0., 1., 0.]) #
11
12 E_leak = -54. #
13 G_leak = 0.3 #
14 E_Na = 60. #
15 Gmax_Na = 120. # electrical parameters
16 E_K = -90. #
17 Gmax_K = 36. #
18 C_M = 1. #
19 I_stim = -10. #
20
21 for i in t[:2499]: # the loop:
22
23 alpha_m = 0.1 * (V[i] + 40.) / (1. - np.exp(-(V[i] \
24 + 40.) / 10.))
25 alpha_h = 0.0027 * np.exp(-V[i]/20.)
26 alpha_n = 0.01 * (V[i] + 55.) / (1. - np.exp(-(V[i] \
27 + 55.) / 10.))
28
29 beta_m = 0.108 * np.exp(-V[i] / 18.)
30 beta_h = 1. / (1. + np.exp(-(V[i] + 35.) / 10.))
31 beta_n = 0.054 * np.exp(-V[i] / 80.)
32
33 alpha =( [alpha_m, alpha_h, alpha_n] )
34 beta =( [beta_m, beta_h, beta_n] )
35
36 mhn[:,i+1] = mhn[:,i] + (alpha * (1. - mhn[:,i]) - \
37 beta * mhn[:,i]) * dt
38
39 V[i+1] = V[i] - dt/C_M * (Gmax_Na * mhn[0,i+1]**3. \
40 * mhn[1,i+1] * (V[i] - E_Na) +
41 Gmax_K * mhn[2,i+1]**4. * (V[i] - E_K) +
42 G_leak * (V[i]-E_leak) + I_stim)

... and let us start to analyze the code from the end. Lines 35-37 are actually one line
of code and - please compare - this is equivalent to the equation given on the
previous page (HH's equation (26)). It has however been rearranged in order to
calculate the voltage evolution. A collection of parameters is provided in lines
12-19.
Line 36 is actually nothing else but line 11 in the previous program at the beginning
of sub-chapter 8.2!

89
Performing Science with Open Source Software Chapter 8 Like a Nobel laureate

The only difference is, that r is replaced by mhn and that line 36 stands for three
arrays at once: mhn is a 3x2500 matrix, and mhn[0,:] stands for gating
parameter m, mhn[1,:] for h, and mhn[2,:] for n. alpha and beta in line
36 are 3-element vectors, see lines 30 and 31, and their voltage dependencies are
formulated in lines 22-28.
... and here is the result:

8.4 Nobel 2.0


Another nobel laureate: Felix Bloch. To see his picture and his biography, check
e.g.: http://www.nobelprize.org/nobel_prizes/physics/laureates/1952/bloch-bio.html.
... and another example of a system of coupled differential equations: the Bloch
equations, which describe the dynamics of nuclear spins, i.e. their magnetic
moments in time-varying magnetic fields. The basis of NMR (Nuclear Magnetic
Resonance), which again is the basis of MR-imaging. In later chapters, we use just
such MR-images.
A short introduction to NMR: some nuclei, e.g. the proton, have the interesting
property of a 'spin' and connected to it, a magnetic moment m. This property may be
modelled by the mechanical analogue, the gyro, having an angular momentum.

90
Performing Science with Open Source Software Chapter 8 Like a Nobel laureate

If the mechanical gyro is tilted in the earth's gravitational field, then it performs a
precession rotary motion. Similarly, the nuclear spins perform a precession motion
with angular frequency ω0 if they are tilted against the direction of a magnetic field
B.

However, in the case of nuclear spins, the configuration shown in the figure above,
is not the natural one, since the magnetic momentum always tends to align with the
direction of the magnetic field.

91
Performing Science with Open Source Software Chapter 8 Like a Nobel laureate

If, by some tricks, the magnetic momentum has been forced out of the alignment,
then it will "relax" into the aligned configuration within a time characterized by a
relaxation time T1. Actually, even another relaxation mechanism comes into play
with T2
The dynamics of the relaxation of the magnetization can be described with the
Bloch-equations. One of the various forms I have copied from Wikipedia below:

Compare this to the following code lines 6-8:


1 import numpy as np
2 # Definition of parameters:
3 Bz = 6.0, Bxy = 0.0, T1 = 5.0, T2 = 2.0, psi = 0.0, M_0 = 1.0
4 # the coupled equations:
5 def dM_dt(M, t=0):
6 return np.array([ M[1]*Bz - M[2]*Bxy*np.sin(psi) -M[0]/T2,
7 M[2]*Bxy*np.cos(psi) - M[0]*Bz + - M[1]/T2,
8 M[0]*Bxy*np.sin(psi) - M[1]*Bxy*np.cos(psi)
9 - (M[2] - M_0)/T1])
10
11 from scipy import integrate
12 t = np.linspace(0, 15, 1000) # time
13 M0 = np.array([1.0, 0.0, 0.0]) # initials conditions
14 M = integrate.odeint(dM_dt, M0, t)
15
16 Mx, My, Mz = M.T

With effectively one command (lines 6-9) one calculates the values of all three
components of m at once! The parameters in above program are chosen for the case
of the relaxation after a 90o-flip (see line 3 and line 12). The result is shown below:

92
Performing Science with Open Source Software Chapter 8 Like a Nobel laureate

8.5 Do it yourself!
My proposal is that you try out and play around with the last two programs by
changing the values of the parameters involved. Observe, how the results are
affected, by plotting the respective data.
Another suggestion: in the last example, we used a class 'integrate' from
scipy. There are many more valuable classes to be utilized for scientific
computing. Check the SciPy Cookbook (http://wiki.scipy.org/Cookbook) for many
stimulating examples.
Our last example above is just an adoption of the script in the Lokta-Volterra tutorial
(http://wiki.scipy.org/Cookbook/LoktaVolterraTutorial?action=show&redirect;=
LoktaVolterraTutorial).

93
Performing Science with Open Source Software Chapter 8 Like a Nobel laureate

94
Performing Science with Open Source Software

Let it roll!

VTK

95
Performing Science with Open Source Software Chapter 9 Let it roll!

Chapter 9
Let it roll!
9.1 Aim: Visualizing
In the previous chapter some of the illustrations have been made by the visualization
toolkit VTK. This is an extremely powerful resource. Very professional and
comprehensive and widely utilized in medical imaging, even by industry and
commercial users.
VTK is very, very complex indeed and it is complicated to learn. There is no open
source user guide available - to my knowledge at this time of printing. The official
VTK-guides are quite expensive. This does not mean, that the open information of
VTK is scarce. On the contrary! Check VTK's documentation
(http://www.vtk.org/doc/nightly/html/index.html) and you will be overwhelmed by
the material. However, for a complete VTK-layman, most of it may be beyond
reach. Nevertheless, with some patience and determination and a lot of try and error,
one might become a semi-experienced user and this will already be sufficient, to
obtain impressive results.
In this book, we will only be able to touch the basics and give some advice, how to
get deeper involved. In this chapter we learn, how to assembly a few building blocs
into nice pictures and how to animate them, i.e. hoe to let them roll. In later chapters,
we learn more about 2D and 3D image processing with vtk.

9.2 Get started


To my opinion the best start is with the cone tutorial, a set of 6 well-commented
scripts, introducing some basic aspects of creating 3D-models, their rendering and
animation. You can find them by searching for cone.py, cone2.py, ...,
cone6.py.

96
Performing Science with Open Source Software Chapter 9 Let it roll!

My favorite access to such example scripts is via 'VTK: Class to Examples'


(http://www.vtk.org/doc/nightly/html/pages.html). This is a very rich collection of
examples of VTK scripts written in different languages like 'C++', 'TCL', and
'Python' ordered alphabetically by their names. Select (double-click) 'Class To
Examples (C)' and scroll down to 'vtkConeSource'. Here you can find under 'Python'
the six cone programs.

You pick your choice and just by double-click the script appears. You may
download this program by right-click and save:

97
Performing Science with Open Source Software Chapter 9 Let it roll!

Try to open cone.py in the Spyder Editor and run it with the Python Console
active. Normally, a window like this one should pop up and the cone should rotate
for a while and then stop in this position.

May be, to your surprise, nothing like this seems to happen! Then the reason might
be, that this window opened behind the Spyder window. You should minimize it, if
you want to observe the action. However, another annoying thing might happen: it
may not be possible, to switch off the cone-window with the -button.
Strange, but true: one can terminate the window by double-clicking at the Spyder
Console pane the -button, then the -button, and then the -button again. By
this, a kind of 'reset' occurs, and it is possible to continue.

The six cone-scripts are very well commented and you may learn already quite a
few VTK-features: creating basic 3D-geometries, mapping, rendering, animating,
interacting, window handling, etc.. Since this is already demonstrated so nicely, I
would rather add some additional goodies like ' c o l o r s ' and guide you to
some useful information linked with the created objects.

98
Performing Science with Open Source Software Chapter 9 Let it roll!

9.3 Color your life


Working with colors means fun. ...and why not having fun when performing
science? Here coloring acts not only as a decorative feature, but color can be utilized
to visualize information, like height over sea level in geographic maps, or
temperature, velocity, strain, viscosity, etc. in physical modelling. We come to this
later...
We have already learned to select colors in the previous matplotlib chapter, when we
just named the colors or their respective abbreviations in key word arguments.
Another way to define a color is the RGB-method, by mixing the three base colors
'red', 'green', and 'blue' with appropriate intensity ratios. E.g. the
background in the 'cone'-example window was ( 0.1, 0.2, 0.4 ), giving a rather deep
blue. However, to hit the right combination to produce just the color hue you want
can be quite cumbersome.
One really nice feature of VTK is the big selection of pre-defined colors. In this way
one does not need to try by oneself the right RGB-combination for the preferred
color shade. The respective program is 'colors.py' which can be found in the
path vtk.util.. Below I copied a few lines of this program with the various
'shades of white'.
# Whites
antique_white = (0.9804, 0.9216, 0.8431)
azure = (0.9412, 1.0000, 1.0000)
bisque = (1.0000, 0.8941, 0.7686)
blanched_almond = (1.0000, 0.9216, 0.8039)
cornsilk = (1.0000, 0.9725, 0.8627)
eggshell = (0.9900, 0.9000, 0.7900)
floral_white = (1.0000, 0.9804, 0.9412)
gainsboro = (0.8627, 0.8627, 0.8627)
ghost_white = (0.9725, 0.9725, 1.0000)
honeydew = (0.9412, 1.0000, 0.9412)
ivory = (1.0000, 1.0000, 0.9412)
lavender = (0.9020, 0.9020, 0.9804)
lavender_blush = (1.0000, 0.9412, 0.9608)
lemon_chiffon = (1.0000, 0.9804, 0.8039)
linen = (0.9804, 0.9412, 0.9020)
mint_cream = (0.9608, 1.0000, 0.9804)
misty_rose = (1.0000, 0.8941, 0.8824)
moccasin = (1.0000, 0.8941, 0.7098)
navajo_white = (1.0000, 0.8706, 0.6784)
papaya_whip = (1.0000, 0.9373, 0.8353)
peach_puff = (1.0000, 0.8549, 0.7255)
seashell = (1.0000, 0.9608, 0.9333)
snow = (1.0000, 0.9804, 0.9804)
thistle = (0.8471, 0.7490, 0.8471)
titanium_white = (0.9900, 1.0000, 0.9400)
wheat = (0.9608, 0.8706, 0.7020)
white = (1.0000, 1.0000, 1.0000)
white_smoke = (0.9608, 0.9608, 0.9608)
zinc_white = (0.9900, 0.9700, 1.0000)

99
Performing Science with Open Source Software Chapter 9 Let it roll!

As an illustration I colored three cones in different blue hues in front of a


navajo_white background. For this I imported only the colors I needed:
from vtk.util.colors import sky_blue, cobalt, cornflower

As a test, try to write a program, that produces just this image.

9.4 Where are the data?


If we do not only want to visualize objects, but also want to quantify the geometric
data, then we have to look behind the scene. The question is: how can we access
these data? In order to find out, let us simplify the object under investigation: a very
low-resolution cone in wire frame display.
1 cone = vtk.vtkConeSource()
2 cone.SetHeight( 2.0 )
3 cone.SetRadius( 1.0 )
4 cone.SetResolution( 5 )
5 coneMapper = vtk.vtkPolyDataMapper()
6 coneMapper.SetInputConnection( cone.GetOutputPort() )
7 coneActor = vtk.vtkActor()
8 coneActor.SetMapper( coneMapper )
9 coneActor.GetProperty().SetRepresentationToWireframe()

100
Performing Science with Open Source Software Chapter 9 Let it roll!

A very informative way to learn about the properties of an object is to cut from the
Editor the lines that make up the object and paste them into the Console (do not
forget to import vtk as a first line). Then -important!- add an update (here:
cone.Update()).
Now you may print a data output: print cone.GetOutput().

Many seemingly useless lines appear. Only in the end a few lines make some sense:
Bounds:
Xmin,Xmax: (-1, 1)
Ymin,Ymax: (-0.809017, 1)
Zmin,Zmax: (-0.951057, 0.951057)
Compute Time: 149
Number Of Points: 6
Point Coordinates: 028EE7F0
Locator: 00000000
Number Of Vertices: 0
Number Of Lines: 0
Number Of Polygons: 6

Yes, the object is made up of 6 points, one at the tip pointing in x-direction and 5
points forming the pentagon base. In addition, the object is made up of 6 polygons: 5
triangles and one pentagon. However, the information about the point coordinates is
hidden behind a memory address. How to retrieve them? Well, try:
print cone.GetOutput().GetPoints().
or even better:
print cone.GetOutput().GetPoints().GetData().

101
Performing Science with Open Source Software Chapter 9 Let it roll!

You may want to know, how I found out. Well, by try and error! You can get help
step by step with the line continuation feature of the Console:

...and then:

Finally, the following lines in the Editor will read out the point coordinates into a
numpy array:
13 cone.Update()
14 points = cone.GetOutput().GetPoints().GetData()
15
16 p = np.zeros([points.GetNumberOfTuples(),\
17 points.GetNumberOfComponents()], dtype='float32')
18 points.ExportToVoidPointer(p)
19
20 print p

102
Performing Science with Open Source Software Chapter 9 Let it roll!

...and the print out is:


[[ 1. 0. 0. ]
[-1. 1. 0. ]
[-1. 0.309017 0.95105654]
[-1. -0.809017 0.58778524]
[-1. -0.809017 -0.58778524]
[-1. 0.309017 -0.95105654]]

q.e.d.!!

9.5 Another twist


After investigating this much-reduced model, let us turn to a more sophisticated
example. An interesting one I have found on homepage VTK:Class To
Examples (T..V) under vtk TubeFilter. However, it is only available
written in C++:

A double-click on the marked line opens the script of


TubesWithVaryingRadiusAndColors.cxx with 141 lines of C++ code. I
have "translated" this into Python. You may compare the following lines with the
C++ code and you will recognize, that Python is much leaner than C++, but in many
aspects quite similar. Hence, it is a bit tedious but not too difficult to retrieve a
Python program from a given C++ example.
1 import vtk
2 import numpy as np
3
4 # Spiral tube
5 nV = 256 # No. of vertices
6 nCyc = 5 # No. of spiral cycles
7 rT1 = 0.1 # Start tube radius
8 rT2 = 0.5 # Max tube radius
9 rS = 2.0 # Spiral radius
10 h = 10.0 # Height
11 nTv = 8 # Surface elements for each tube vertex
12

103
Performing Science with Open Source Software Chapter 9 Let it roll!

13 # Create points and cells for the spiral


14 points = vtk.vtkPoints()
15
16 for i in range(nV): # Spiral coordinates
17 vX = rS * np.cos(2 * np.pi * nCyc * i / (nV - 1))
18 vY = rS * np.sin(2 * np.pi * nCyc * i / (nV - 1))
19 vZ = h * i / nV
20 points.InsertPoint(i, vX, vY, vZ)
21
22 lines = vtk.vtkCellArray()
23 lines.InsertNextCell(nV)
24 for i in range(nV):
25 lines.InsertCellPoint(i)
26
27 polyData = vtk.vtkPolyData()
28 polyData.SetPoints(points)
29 polyData.SetLines(lines)
30
31 # Varying tube radius using sine-function
32 tubeRadius = vtk.vtkDoubleArray()
33 tubeRadius.SetName("TubeRadius")
34 tubeRadius.SetNumberOfTuples(nV)
35 for i in range(nV):
36 tubeRadius.SetTuple1(i, rT1 + (rT2 - rT1) * \
37 np.sin(np.pi * i / (nV - 1)))
38 polyData.GetPointData().AddArray(tubeRadius)
39 polyData.GetPointData().SetActiveScalars("TubeRadius")
40
41 # RBG array, varying from blue to red
42 colors = vtk.vtkUnsignedCharArray()
43 colors.SetName("Colors")
44 colors.SetNumberOfComponents(3)

104
Performing Science with Open Source Software Chapter 9 Let it roll!

45 for i in range(nV):
46 colors.InsertTuple3(i, int(255 * i/ (nV - 1)), 0, \
47 int(255 * (nV - 1 - i)/(nV - 1)) )
48
49 polyData.GetPointData().AddArray(colors)
50
51 tube = vtk.vtkTubeFilter()
52 tube.SetInput(polyData)
53 tube.SetNumberOfSides(nTv)
54 tube.SetVaryRadiusToVaryRadiusByAbsoluteScalar()
55
56 mapper = vtk.vtkPolyDataMapper()
57 mapper.SetInputConnection(tube.GetOutputPort())
58 mapper.ScalarVisibilityOn()
59 mapper.SetScalarModeToUsePointFieldData()
60 mapper.SelectColorArray("Colors")

...plus the usual rendering stuff.

105
Performing Science with Open Source Software Chapter 9 Let it roll!

9.6 Do it yourself!
Try to understand the last program by analyzing it systematically and with reduced
resolution. You may pass the following instants shown below.

106
Performing Science with Open Source Software

Some physics

SciPy

107
Performing Science with Open Source Software Chapter 10 Some physics

Chapter 10
Some physics
10.1 Aim: Handling vectors; example: Biot-Savart law
Often calculations in mathematics, physics and engineering require more than just
basic operations. Here we will tackle an example involving a vector cross product,
e.g. Biot-Savarts law in differential form. It describes how the magnetic induction
field dB is connected to an electric current element I dl.

10.2 Preliminaries
Let us start with the prefactor marked by the red frame below and have a look to
SciPy constants.

108
Performing Science with Open Source Software Chapter 10 Some physics

This is a nice resource derived from CODATA and very convenient, since you have
a vast selection of constants of high precision with their units swiftly at hand, as is
demonstrated by the few code alternatives shown below:
>>> from scipy.constants import pi, mu_0
>>> pi
3.141592653589793
>>> mu_0
1.2566370614359173e-06
>>> from scipy.constants import physical_constants
>>> physical_constants['mag. constant']
(1.2566370614359173e-06, 'N A-2', 0.0)
>>> from scipy.constants import codata
>>> codata.value(u'mag. constant')
1.2566370614359173e-06
>>> codata.unit('mag. constant')
'N A-2'
>>> codata.precision(u'proton mass')
4.4241920688564605e-08

10.3 Sidestepping by a few units


Friendly enough SciPy.constants served us with the units of the magnetic constant.
But -cross your heart- do you still know, what the unit of magnetic induction is and
if "yes", how it is connected to N, V, m, A, s? Also for this Python distribution has a
helping hand for us: numericalunits.py. It is not provided by the regular
download of python(xy), however as an additional plug-in.

109
Performing Science with Open Source Software Chapter 10 Some physics

Lately, I experienced that the additional plug-in page of python(xy) did not anymore
exist. No problem. Use pip to download it from PyPI - the Python Package Index.
For this you just type into the Command Prompt
"pip import numericalunits" and -jiffy- it is included in the collection of
the Python site-package and you may start to utilize it.

Read the User Guide on the homepage of numericalunits and start e.g. like this:
>>> import numpy as np
>>> import numericalunits as nu
>>> nu.reset_units()
>>> mu_0 = 1.2566370614359173e-06 * (nu.N / nu.A**2)
>>> Idl = 1.0e-6 * (nu.A * nu.m)
>>> rr = 0.01 * 0.01 *(nu.m**2)
>>> B = mu_0 * Idl /rr /(4.0*np.pi) /nu.nT
>>> print B
1.0

Ergo, for the parameters chosen above, the absolute value for the magnetic induction
orthogonally above the current dipole Idl is 1.0 nT. Of course, for this configuration
we did not need to calculate with vectors.
An alternative to numericalunits is SciMath, which even works with
NumPy. Here all physical units are broken down to the SI base units.

110
Performing Science with Open Source Software Chapter 10 Some physics

The following script shows, how to 'construct' from the Codata physical constants
with derived units the SciMath UnitScalars and how to calculate with them:
1 from scipy.constants import codata, pi
2 from scimath.units.length import *
3 from scimath.units.force import *
4 from scimath.units.electromagnetism import *
5 from scimath.units.api import UnitScalar
6 import re
7
8 def Cd2SM(x,u):
9 u2u = {'N':'newton', 'A':'ampere', 'T':'tesla'}
10 us = re.split('[ ]', u)
11 i = -1
12 while us:
13 i += 1
14 uc = us.pop(0)
15 au = re.search('\w+', uc)
16 if i == 0:
17 a = UnitScalar(1.0, units = u2u[au.group(0)])
18 else:
19 e = re.search('(\d+)', uc)
20 if e:
21 for j in range(int(e.group(0))):
22 s = re.search('-', uc)
23 if s:
24 a = a /UnitScalar(1.0, units \
25 = u2u[au.group(0)])
26 else:
27 a = a *UnitScalar(1.0, units \
28 = u2u[au.group(0)])
29 else:
30 a = a *UnitScalar(1.0, units = \
31 u2u[au.group(0)])
32 a = a * x
33 return a
34
35 muv = codata.value('mag. constant')
36 muu = codata.unit('mag. constant')
37 mu_sm = Cd2SM(muv, muu)
38 print mu_sm, mu_sm.units
39
40 I = UnitScalar(1.0e-3, units='ampere')
41 dl = UnitScalar(1.0e-3, units='meter')
42 r = UnitScalar(0.01, units='m')
43 B = mu_sm / pi /4.0 * I * dl /r**2
44 print B, B.units
45 print B * tesla
46 print B.units /tesla

SciMath is a bit complicated to handle and often the results are puzzling, if you are
not really familiar with the syntax. E.g. try to change in line .. the unit from meter
to cm and print out only print B. One would expect a change of 'B' by a factor of
1000. But no change at all!! Check print B.units and you may recognize
what happened.

111
Performing Science with Open Source Software Chapter 10 Some physics

Some comments:
line 1: importing the Codata constants and 'π'.
lines 2-5: a bit long-winded: you need to import the units for all categories
individually. In order to know, what categories are relevant, you might be forced to
inspect dimensions.py probably behind the path:
Python27\Lib\site-packages\scimath\physical_quantities\.
line 6: This imports Python's regular expressions module. Check the Python
documentation, how to deal with this fantastic text editing helper. A great support is
the python-regex-cheatsheet.pdf which you may probably find at:
C:\Programme(x86)\pythonxy\doc\Python\.
lines 8-30: A function which converts the Codata notation into the respective
SciMath UnitScalar-value.
line 9: Here we use for the first time in this book a dictionary, however, only a very
small one just for demonstration. In practice it would make sense to create a
dictionary with all possible combinations of key-value pairs.
lines 10, 15, 19, and 22 make use of re.
lines 32-36: Reading the value and unit for the magnetic constant from Codata and
expressing it as derived UnitScalar.

lines 38-45: Further parameters are prepared and the magnetic induction B is
computed and printed in different ways.

10.4 Vectorization
Until now we played around with scalar quantities and relations. Let's get serious
with vectors and the cross product. In addition we want to know the distribution of
magnetic induction values all over the upper plane shown in the figure displayed at
the beginning of this chapter. ...later...
First let us start with calculating the vector of the magnetic induction for only one
position in the upper plane, e.g. at x = 0.04, y = 0.03, and z = 0.05. In the following
script rd stands for the distance vector r - r' and ra for its absolute value.
One could well calculate it by ra = np.sqrt(x**2 + y**2 + z**2), but
we are following more the vector calculation spirit as shown below:
1 import numpy as np
2 from scipy.constants import pi, mu_0
3
4 x = 0.04
5 y = 0.03
6 z = 0.05

112
Performing Science with Open Source Software Chapter 10 Some physics

7 rd = np.array([x,y,z])
8 ra = np.sqrt(np.sum(np.square(rd)))
9
10 pre_factor = mu_0/4.0/pi/ra**3
11
12 dl = np.array([0.001,0.0,0.0])
13
14 B = pre_factor * np.cross(dl,rd)

In line 7 the vector rd is created as a numpy array generated from the list
[x,y,z] containing the three vector components. Then rd can be treated like a
usual variable. E.g. in line 8 the absolute value is the root of the sum of the squares
of rd and in line 14 it is involved in the cross product with vector dl which leads
-together with the pre-factor- to the magnetic induction vector B.
Next, let us see what happens, if we calculate simultaneously for more than one
position:
4 x = np.array([0.04,0.03,0.02,0.01,0.0])
5 y = np.array([0.03,0.02,0.01,0.0,-0.01])
6 z = np.array([0.05,0.05,0.05,0.05,0.05])
7
8 rd = np.array([x,y,z])
9 print 'rd=', rd
10 ra = np.sqrt(np.sum(np.square(rd)))
11 print 'ra(wrong)=', ra
12
13 ra = np.sqrt(x**2 + y**2 + z**2)
14 print 'ra(right)=', ra

The resulting print-out shows, that the code in line 10 gives the wrong result as
verified by line 13!
rd= [[ 0.04 0.03 0.02 0.01 0. ]
[ 0.03 0.02 0.01 0. -0.01]
[ 0.05 0.05 0.05 0.05 0.05]]
ra(wrong)= 0.130384048104
ra(right)= [ 0.07071068 0.06164414 0.05477226 \
0.0509902 0.0509902 ]

What's wrong? The best way to find out is to separate line 10 and perform the three
steps np.square, np.sum, and np.sqrt one after another, either interactively
with the console or by printing out the result of each step, when working with the
editor. Then you will recognize, that np.square squares all elements of rd. This is
o.k. so far.
Next, np.sum is summing up all squared elements, which is of course wrong. We
only want to sum up the three vector components for each distance vector
individually. This can be done by specifying the keyword axis.

113
Performing Science with Open Source Software Chapter 10 Some physics

This figure shows, what axis is concerned for what key value:

Thus, line 10 should be replaced by


ra = np.sqrt(np.sum(np.square(rd), axis = 0)) . Similarly, we
have to take care in the following lines, what axes are concerned. Try to find out,
what np.c_ does to the pre-factor, why one has to 'repeat' dl, and finally, how the
transposition of rd to rt affects the choice of axisa and axisb in the cross
product:
10 ra = np.sqrt(np.sum(np.square(rd), axis = 0))
11
12 pre_factor = mu_0/4.0/pi/ra**3
13
14 dl = np.array([0.001,0.0,0.0])
15 ndl = np.repeat([dl],5,axis=0)
16
17 rt = np.transpose(rd)
18
19 B = np.c_[pre_factor] * np.cross(ndl,rt,axisa=1,axisb=1)
20 B = np.c_[pre_factor] * np.cross(ndl,rd,axisa=1,axisb=0)

With these preliminary preparations you might be able to understand the full
program calculating the magnetic inductance vectors for 80*80 positions in the
upper plane:
10 import numpy as np
11 from scipy.constants import pi, mu_0
12 from enthought.tvtk.api import tvtk
13
14 x0 = np.arange(-20., 20.1, 0.5)
15 x_coord = x0[:,None] * np.ones(81)
16 y_coord = np.transpose(x_coord)
17 z_coord = 5.0 *np.ones([81,81])
18
19 x = np.ravel(x_coord)
20 y = np.ravel(y_coord)
21 z = np.ravel(z_coord)

114
Performing Science with Open Source Software Chapter 10 Some physics

22
23 r = np.array([x,y,z])
24 ra = np.sqrt(np.sum(np.square(r), axis = 0))
25 pre_factor = mu_0/4.0/pi/ra**3
26 dl = np.array([0.001,0.0,0.0])
27 ndl = np.repeat([dl],np.size(x),axis=0)
28 B = np.c_[pre_factor] * np.cross(ndl,r,axisa=1,axisb=0)
29 z = B[:,2]*8000000000000.0
30
31 sp = tvtk.StructuredPoints(origin=(-20., -20., 0.0),
32 dimensions=(81, 81, 1),
33 spacing=(0.5, 0.5, 0.0))
34
35 sp.point_data.scalars = z

10.5 Do it yourself!
The last two statements are a preparation for a visualization with tvtk, leading to
the figure showing the distribution of z-component values of the magnetic induction
in a plane over a current dipole. Try to continue the code respectively.

115
Performing Science with Open Source Software Chapter 10 Some physics

Notes:

116
Performing Science with Open Source Software

Perfect fit?

ITK

117
Performing Science with Open Source Software Chapter 11 Perfect fit?

Chapter 11
Perfect fit?
11.1 Aim: Image registration - or: how to understand the
ITK user guide
ITK is a most professional software toolkit for image processing, particularly for
registration and segmentation of medical images it is considered as a golden
standard. Fortunately there exists a very detailed open source user guide (see
shortcut in figure below). Unfortunately, it is only C++-oriented and most example
programs are .cxx-programs. One may find a few .py-examples in the pythonxy itk
doc test folder and some others on the itk examples page:
http://www.itk.org/Doxygen/html/examples.html. But even this C++ oriented user
guide mentions the advantages of Python. May I cite? "The advantage of interpreted
languages is that they do not require the lengthy compile/link cycle of a compiled
language like C++. Moreover, they typically come with a suite of packages that
provide useful functionality."

Still, I would like to motivate you, to study the user guide. As a little help to my
friends, I will "translate" step by step one chapter of the user guide from C++ to the
Python equivalent: "Chapter 3.2 "Hello World"-Registration" in
"InsightSoftwareGuide-Book2-4.6.0.pdf".

118
Performing Science with Open Source Software Chapter 11 Perfect fit?

11.2 Hello World Registration: from .cxx to .py


Please read in parallel Chapter 3.2 of the ITK User Guide:

C++ (copy from User Guide):

"Translation" to Python:
either
1 import itk

... and then starting all itk-statements with the prefix itk.
or:
from itk import ImageRegistrationMethod,\<br/>
TranslationTransform, MeanSquaresImageToImageMetric,\<br/>
RegularStepGradientDescentOptimizer

119
Performing Science with Open Source Software Chapter 11 Perfect fit?

C++:

Python:
Dimension = 2
PixelType = itk.F

This "translation" I found out by searching in the interactive line continuation mode
of the console.
FixedImageType = itk.Image[PixelType, Dimension]
MovingImageType = itk.Image[PixelType, Dimension]

What is now missing in the user guide, but has to be included implicitly, is to initiate
the readers:
fixedImageReader = itk.ImageFileReader[FixedImageType].New()
movingImageReader = itk.ImageFileReader[MovingImageType].New()

All the last statements can be shortened as shown here:


2
3 fixedImageReader = itk.ImageFileReader.IF2.New()
4 movingImageReader = itk.ImageFileReader.IF2.New()

The user guide proposes to use these two images:

However, it is not so easy, to find the respective .png-files.


Try: http://www.na-mic.org/svn/Slicer3-lib-mirrors/trunk/Insight/Examples/Data/

120
Performing Science with Open Source Software Chapter 11 Perfect fit?

Here they are:

Let us read the first image and try to get some information about it:
5
6 fixedImageFileName='BrainProtonDensitySliceBorder20.png'
7 movingImageFileName='BrainProtonDensitySliceShifted13x17y.png'
8
9 fixedImageReader.SetFileName(fixedImageFileName)
10 movingImageReader.SetFileName(movingImageFileName)
11
12 fixedImageReader.Update()
13 movingImageReader.Update()

11.3 Intermezzo I: an odyssey


Already when I was running this short program it ended up with an error message:
>>> fixedImageReader.Update()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: C:\u\itk-git\_b\Modules\Remote\SCIFIO\src\\
itkSCIFIOImageIO.cxx:274:
itk::ERROR: SCIFIOImageIO(03CA2968): SCIFIO_PATH is not set.
This environment variable must point to the directory \
containing the SCIFIO JAR files
>>>

Typically! What looked like a short intermezzo turned into an odyssey. I did not
understand this message and all my efforts failed.

121
Performing Science with Open Source Software Chapter 11 Perfect fit?

Posting this problem in the python(xy)- and ITK-communities lead only to advices,
which did not help as well. Since I knew that this program worked some months
before, I tried to uninstall the current version of python(xy) and installed older
versions. (Thanks to the mirror site at Kent University:
https://www.mirrorservice.org/sites/pythonxy.com/). Finally I succeeded with
python(xy)-version 2.7.5.2. Hopefully, when you read these lines, a newer update
has fixed the bug.

11.4 Continuation I:
Now the reader update works and we may continue with:
14
15 fixedImage = fixedImageReader.GetOutput()
16 movingImage = movingImageReader.GetOutput()

... and in order to see the read-in image, we could write a file:
writer = itk.ImageFileWriter.IF2.New()
writer.SetFileName('Test_1.png')
writer.SetInput(fixedImageReader.GetOutput())
writer.Update()

But... the output is disappointing; an error message:


writer.Update()
RuntimeError: C:\u\InsightToolkit-4.4.1\Modules\IO\PNG\src\\
itkPNGImageIO.cxx:503:
PNG supports unsigned char and unsigned short
>>>

... and the Test_1.png file is empty.

11.5 Intermezzo II: on pixel types


You can fix this problem, by changing the pixel type in line 3 from .IF2 to
.IUC2 as the error message seems to suggest this. ... and it works! But why is the
itk user guide predetermining float as the pixel type? The reason is, that most
following calculations, like filtering, registration, etc. are better or more precisely
done with floating point numbers. Thus, when reading in the image, it is good to
choose .IF2.
But how should we receive a .png-file from this read-in image? The small test
program CastImageFilter.py, which you may find in:
C:\Program Files(x86)\pythonxy\doc\Libraries\itk\Tests\ will give you the right
hints.

122
Performing Science with Open Source Software Chapter 11 Perfect fit?

11.6 Continuation II:


After this short intermezzo, we proceed.

C++:

123
Performing Science with Open Source Software Chapter 11 Perfect fit?

This is all much leaner with Python!:


17
18 registration = itk.ImageRegistrationMethod.IF2IF2.New()
19 imageMetric = itk.MeanSquaresImageToImageMetric.IF2IF2.New()
20 transform = itk.TranslationTransform.D2.New()
21 optimizer = itk.RegularStepGradientDescentOptimizer.New()
22 interpolator = itk.LinearInterpolateImageFunction.IF2D.New()

C++:

Python:
23
24 registration.SetMetric( imageMetric )
25 registration.SetOptimizer( optimizer )
26 registration.SetTransform( transform )
27 registration.SetInterpolator( interpolator )
28 registration.SetFixedImage( fixedImage )
29 registration.SetMovingImage( movingImage )
30 registration.SetFixedImageRegion( fixedImage.\
31 GetBufferedRegion() )

I guess, for the rest of this chapter I do not need to go on to copy from the guide but
continue only with the respective "translation".
31
32 transform.SetIdentity()
33 initialParameters = transform.GetParameters()
34
35 registration.SetInitialTransformParameters(initialParameters)
53
54 optimizer.SetMaximumStepLength( 4.00 )
55 optimizer.SetMinimumStepLength( 0.01 )
56 optimizer.SetNumberOfIterations( 200 )
57
58 registration.Update()
59
60 finalParameters = registration.GetLastTransformParameters()

124
Performing Science with Open Source Software Chapter 11 Perfect fit?

61 print "Final Registration Parameters "


62 print "Translation X = %f" % (finalParameters.GetElement(0),)
63 print "Translation Y = %f" % (finalParameters.GetElement(1),)
64 print "Number of Iterations: %i" % (optimizer.\
65 GetCurrentIteration())
66 print "Best Value: %f" % (optimizer.GetValue())

The result:
Final Registration Parameters
Translation X = 12.995881
Translation Y = 17.000056
Number of Iterations: 18
Best Value: 0.007453

Well done!

11.7 Observer
You may have recognized, that some lines of the code were skipped above. This will
be made up for here. If one wants to know the intermediate results of the moving
image shifts, one should include an "observer":
36
37 n_Iter = []
38 n_Value = []
39 n_X = []
40 n_Y = []
41
42 def iterationUpdate():
43 global n_Iter, n_Value, n_xy
44 n_Iter.append(optimizer.GetCurrentIteration())
45 n_Value.append(optimizer.GetValue())
46 currentParameter = optimizer.GetCurrentPosition()
47 n_X.append(currentParameter.GetElement(0))
48 n_Y.append(currentParameter.GetElement(1))
49
50 iterationCommand = itk.PyCommand.New()
51 iterationCommand.SetCommandCallable( iterationUpdate )
52 optimizer.AddObserver(itk.IterationEvent(), iterationCommand)

Printing the results:


67
68 n_Value.pop()
69 n_Value.append(optimizer.GetValue())
70 n_X.append(finalParameters.GetElement(0))
71 n_Y.append(finalParameters.GetElement(1))
72
73 for i in range(optimizer.GetCurrentIteration()-1):
74 print " %i: %f, [ %f, %f]" % ( n_Iter[i], \
75 n_Value[i], n_X[i], n_Y[i])

125
Performing Science with Open Source Software Chapter 11 Perfect fit?

Print-out:
0: 4499.453034, [ 2.928696, 2.724471]
1: 3860.836258, [ 5.627510, 5.676826]
2: 3450.677523, [ 8.855155, 8.039517]
3: 3139.415148, [ 11.799687, 10.746864]
4: 2189.974519, [ 13.362815, 14.428797]
5: 1037.883083, [ 11.291967, 17.851017]
6: 895.902267, [ 13.160248, 17.137206]
7: 19.454337, [ 12.326818, 16.584582]
8: 236.181016, [ 12.782416, 16.790569]
9: 38.133068, [ 13.183254, 17.089448]
10: 18.750728, [ 12.949041, 17.002016]
11: 1.149746, [ 13.073973, 16.997874]
12: 2.413224, [ 13.011492, 16.999416]
13: 0.058771, [ 12.949047, 17.002043]
14: 1.149698, [ 12.980279, 17.000994]
15: 0.173007, [ 13.011502, 16.999691]
16: 0.007453, [ 12.995881, 17.000056]

The translation path:

126
Performing Science with Open Source Software Chapter 11 Perfect fit?

11.8 The final move


The only thing we know until now is, how much we have to shift the moving image
in x and y direction. Now we should really move the moving image to its final
position. This is done by:
75
76 resampler = itk.ResampleImageFilter.IF2IF2.New()
77 resampler.SetTransform( transform)
78 resampler.SetInput( movingImage )
79
80 region = fixedImage.GetLargestPossibleRegion()
81
82 resampler.SetSize( region.GetSize() )
83
84 resampler.SetOutputSpacing( fixedImage.GetSpacing() )
85 resampler.SetOutputOrigin( fixedImage.GetOrigin() )
86 resampler.SetOutputDirection( fixedImage.GetDirection() )
87 resampler.SetDefaultPixelValue( 100 )
88
89 outputCast = itk.RescaleIntensityImageFilter.IF2IUC2.New()
90 outputCast.SetInput(resampler.GetOutput())
91
92 outputImageFileName='BrainReg_Test_01.png'
93
94 writer = itk.ImageFileWriter.IUC2.New()
95 writer.SetFileName( outputImageFileName )
96 writer.SetInput( outputCast.GetOutput() )
97 writer.Update()

127
Performing Science with Open Source Software Chapter 11 Perfect fit?

11.9 Do it yourself: very very demanding!


The "Hello World"-registration was of course a simple one. When you continue to
read the guide, you will recognize quite a number of different registration techniques
for different purposes. If you are looking for a real world problem: here is one on the
following page. These are two microscope images of the subsequent slices of heart
ventricle tissue cut with a microtome and fixed and stained. Although these are cuts
of very close proximity, the staining was quite differently effective. Another
problem: the slices are differently skewed. Thus one would need a deforming
registration. A real challenge! Do you dare? Scan the two images and try your best.
But before you start you may better digest the next chapter.

128
Performing Science with Open Source Software Chapter 11 Perfect fit?

129
Performing Science with Open Source Software Chapter 11 Perfect fit?

130
Performing Science with Open Source Software

Glue

ITK-VTK

131
Performing Science with Open Source Software Chapter 12 Glue

Chapter 12
Glue
12.1 Aim: exploit the best of two worlds
Since ITK is excellent for image processing but provides no visualization it makes
sense to combine it with VTK. However, this liaison is very tricky and with some
python(xy) versions the glue did not work. Hopefully, this is now fixed. In the
following we may try to perform some interactive interplay between these two
worlds with a MR image of a human chest. This demonstrates only a small sample
of the many possibilities available."

12.2 Reading a DICOM image


DICOM is the professional format for medical images. It contains not only the
image grey scale values and parameters, but a lot of additional information about the
imaging machine (MR, CT, US, ...) and the patient data, position, etc. The following
files have been anonymized. There exists a special plugin called PyDICOM, but for
our purposes the vtk DICOM reader is sufficient.
For working through this chapter you may download some DICOM files from a
paper, I published a few years ago together with colleagues:
www.ncbi.nlm. nih.gov/pmc/articles/PMC3037925/ . Take for instance 'Additional
file 4' and download the ZIP-file, open and unzip until you find the image files.
Select one, e.g. IM_1000 and copy it to the folder where your program is with which
you want to read and process this image. A few lines of code and you get a whole
load of output:

132
Performing Science with Open Source Software Chapter 12 Glue

You may obtain even more specific output, if you add e.g.:
7 print 'range: ', reader.GetOutput().GetPointData().\
8 GetArray('DICOMImage').GetRange()
9 print 'bounds: ', reader.GetOutput().GetBounds()
10 print 'extent: ', reader.GetOutput().GetExtent()
11 print 'spacing: ', reader.GetOutput().GetSpacing()
12 print 'dimensions: ', reader.GetOutput().GetDimensions()
13 size_x,size_y,size_z = reader.GetOutput().GetDimensions()
14 print 'size_x, size_y: ', size_x, size_y

... which gives:


range: (0.0, 2702.932373046875)
bounds: (0.0, 358.5, 0.0, 358.5, 0.0, 0.0)
extent: (0, 239, 0, 239, 0, 0)
spacing: (1.5, 1.5, 3.0)
dimensions: (240, 240, 1)
size_x, size_y: 240 240

range shows the incredible grey scale range of this MRI image. It contains
240x240 pixels with spacing between pixels of 1.5 mm. Since we read only one
image the data for the third dimension are not relevant here. Thus the chest scan
covered 358.5x358.5 mm².
Again the line continuation mode of the console was very helpful to find the right
information. See, how I found out the right statement in line 7 above by scanning
through the Get... extensions:

133
Performing Science with Open Source Software Chapter 12 Glue

>>> import vtk


>>> reader = vtk.vtkDICOMImageReader()
>>> reader.SetFileName('IM_1000')
>>> reader.Update()
>>> a = reader.GetOutput()
>>> b = a.GetPointData()
>>> c = b.GetArrayName(0)
>>> c
'DICOMImage'
>>> c = b.GetArray('DICOMImage')
>>> d = c.GetRange()
>>> d
(0.0, 2702.932373046875)

All pixel grey scale values are stored in an array with name 'DICOMImage'.
Now let us view the image:
7 view = vtk.vtkImageViewer()
8 view.SetInput(reader.GetOutput())
9 view.Render()

Huh! Quite dark. We may change contrast and brightness by playing around with
these two inclusions:
9 view.SetColorLevel(400)
10 view.SetColorWindow(1500)
11 view.Render()

134
Performing Science with Open Source Software Chapter 12 Glue

How about resampling?


6 resample = vtk.vtkImageResample()
7 resample.SetInputConnection(reader.GetOutputPort() )
8 resample.SetAxisMagnificationFactor ( 0, 3 )
9 resample.SetAxisMagnificationFactor ( 1, 3 )
10 view = vtk.vtkImageViewer()
11 view.SetInput(resample.GetOutput())
12 view.SetColorLevel(400)
13 view.SetColorWindow(1500)
14 view.Render()

135
Performing Science with Open Source Software Chapter 12 Glue

12.3 Involving ITK


Although resampling could be done with VTK - as shown above - we will now jump
from VTK to ITK, resample there, and jump back to VTK:

12.4 Interacting!
In order to interact we have to include a few lines and write a short subprogram
Keypress which is called from
iren.AddObserver("KeyPressEvent", Keypress). When you now start
the program the VTK-window pops up and when you move your mouse cursor over
the image and press the key 'p' from your keypad then the position of the key press
event and the grey scale value of the image at this position is printed in the console
pad. Note, that the position is from a VTK object, the window, and the pixel value is
from the ITK object 'filter' ! See next page.

136
Performing Science with Open Source Software Chapter 12 Glue

While we just learned to interact, let us perform a different interaction as well.


Instead of pressing the keypad key you could get a similar result by activating the
mouse button. For this you have to replace the subprogram Keypress by the
subprogram ButtonEvent and change the observer accordingly:
def ButtonEvent(obj, event):
if event == "LeftButtonPressEvent":
xypos = iren.GetEventPosition()
print 'xypos = ', xypos
value = filter.GetOutput().GetPixel([xypos[0], xypos[1]])
print 'value = ', value

iren.AddObserver("LeftButtonPressEvent", ButtonEvent)

137
Performing Science with Open Source Software Chapter 12 Glue

12.5 Smooth control by comparison


For the next step - a segmentation - it is advantageous to smooth the image. We do it
interactively with another key press and control the change by placing the original
image and the smoothed one side by side. Pressing key 's' several times (slowly),
increases the smoothing step by step, while key 'x' decreases it until an error
message stops the whole sequence (check the printed output).
33 smoothing = itk.CurvatureFlowImageFilter.IF2IF2.New()
34 smoothing.SetInput(filter.GetOutput())
35 smoothing.SetNumberOfIterations( 5 )
36 img_3 = itk.ImageToVTKImageFilter.IF2.New()
37 img_3.SetInput(smoothing.GetOutput())
38 img_3.Update()
39 view_2 = vtk.vtkImageViewer()
40 view_2.SetInput(img_3.GetOutput())
41 view_2.SetSize(600,600)
42 view_2.SetColorLevel(400)
43 view_2.SetColorWindow(1500)
44 view_2.Render()
45 s_n = 10
46
47 def Keypress(obj, event):
48 global s_n
49 key = obj.GetKeySym()
50 if key == "s":
51 s_n = s_n + 10
52 if key == "x":
53 s_n = s_n - 10
54 print 's_n', s_n
55 smoothing.SetNumberOfIterations(s_n)
56 view_2.Render()
57 iren.AddObserver("KeyPressEvent", Keypress)
58 iren.Start()

138
Performing Science with Open Source Software Chapter 12 Glue

12.6 Interactive segmentation


Insert the following lines:
36 cTF = itk.ConnectedThresholdImageFilter.IF2IF2.New()
37 cTF.SetInput(smoothing.GetOutput())
38 cTF.SetLower(-10.0)
39 cTF.SetUpper(50.0)
40 cTF.SetReplaceValue(2000.0)
41 cTF.SetSeed((10,10))

What does this filter with your smoothed image? It starts from the pixel position
which you determine by setting the seed. Then it collects all adjacent pixels which
fulfill the requirement to have a value between the declared upper and lower limit
for grey scale values and replaces them by a pixel value which you set. The result is
a connected region over a black background. If you have chosen a bright replace
value, then this connected region is white. Check yourself, why the lower threshold
value is slightly negative.

This task is really calling for interaction. Prepare the key press subprogram like this:
u = 5.0
l = -5.0
def Keypress(obj, event):
global l, u
key = obj.GetKeySym()
if key == "u":
u = u + 5.0
if key == "z":
u = u - 5.0

139
Performing Science with Open Source Software Chapter 12 Glue

if key == "l":
l = l - 5.0
if key == "k":
l = l + 5.0
if key == "p":
xypos = iren.GetEventPosition()
print 'seed position', xypos
cTF.SetSeed((xypos[0], xypos[1]))
v = img_1.GetOutput().GetPixel([xypos[0], \
xypos[1]])
u = v + 5.0
l = v - 5.0
print 'lower, upper value', l, u
cTF.SetLower(l)
cTF.SetUpper(u)
view_2.Render()

... run the program and do the following "tour": place the cursor over the left arm in
the original image window and press key 'p'. This sets the seed position. Then press
the 'u' key as long until the next picture appears (at an upper threshold value of app.
475.0).

Next press the 'k' key which increases the lower threshold. The next two pictures
should be achievable. Finally increase the upper threshold again with the 'u' key until
you reach the equivalent to the last picture.

140
Performing Science with Open Source Software Chapter 12 Glue

12.7 Do it yourself: remove the arm


The last picture above could act as a template with which you should be able to
separate the left arm region from the original MR-image. Hints: use the
itk.RescaleIntensityImageFilter, convert back to VTK and write a template file. Then
use the VTK image mathematics.

141
Performing Science with Open Source Software Chapter 12 Glue

Notes:

142
Performing Science with Open Source Software

Back to school

SymPy

143
Performing Science with Open Source Software Chapter 13 Back to school

Chapter 13
Back to school
13.1 Aim: Doing some symbolic math
After all this imaging let us turn to a different topic: solving equations, integrals, etc.
Your abilities in this field may have faded - as have mine. Hence, it is convenient,
that there exist some symbolic math programs. However, the commercial packages
"Mathematica" and "Maple" , just to name two, may be out of reach for you, because
they are commercial. Then the open source SymPy could be a helpful surrogate in
most practical cases. Indeed, very demanding applications have been solved with
SymPy from statistics to quantum mechanics, finite element computations, and so
on.
It is very challenging to work with SymPy. A very comprehensive documentation is
available in the internet and within the python(xy) documentation. In the latter folder
there are also some examples for beginners, intermediate and advanced users. Thus
this chapter can be quite short and I will present only two simple applications I
employed recently.

13.2 Before we really start: preliminaries


As often, the more difficult task is to get oriented and prepared. In this case, when
dealing with mathematics, particularly with vector algebra, it is better, to work with
LaTex instead of the so-called pretty printing. See the difference:
From the SymPy documentation example basic.py in the Editor ...
1 import sympy
2 from sympy import pprint
3
4 def main():
5 a, b, c = sympy.symbols('a b c')
6 e = ( a*b*b + 2*b*a*b )**c
7
8 pprint(e)
9
10 if __name__ == "__main__":
11 main()

... we get the following output in the Console:


c
■ 2■
■3⋅a⋅b ■

Pretty ugly this "pretty printing", isn't it?

144
Performing Science with Open Source Software Chapter 13 Back to school

However, if we use the IPython console, the world looks much nicer, if LaTex has
been set up:

Well, the interactive console may be fine for short sessions, but for more extensive
programming the Editor has to be preferred. For a nice output, we need to install the
whole Tex/pdf/ps-machinery. This is not so easy. I had some difficulties to
understand WinShell and Acrobat Reader and ended up again and again with error
messages stopping my elan. More success I had with this combination:

For postscript output: GhostScript and GhostView, for pdf: Foxit Reader, for TEX:
MiKTEX, TeXLive (http://www.tug.org/texlive/acquire-netinstall.html), and as an
IDE: TeXnicCenter. It is important, that you install TeXLive and the TeXnicCenter
after all the other packages, because during installation the wizard is asking for the
paths to Latex etc.

145
Performing Science with Open Source Software Chapter 13 Back to school

After you successfully installed all these packages, you should try to copy and past
the program simple_check_latex.py from the SymPy documents example folder (the
program is inside the galgebra-folder) into one of your projects and start it in the
Spyder Editor. The output in the Console indicates, that a latex file named
"simple_test_latex.tex" has been created. If you now open the TeXnicCenter and
open this file, then you should choose LaTex --\> PDF as the Output profile and
press the "Build Output" button (see below). Most probably some error message will
appear like in my case:

Maybe you have to give yourself the permissions to read and write in the MiKTeX
2.9 folder.Go to Program Files and right click the MiKTeX 2.9 folder. Then click
Properties > Security Tab > Edit Button. Select Users (...) from the list and check all
grants. Or "listings.sty" is not found, as stated in the error message in the figure
above. Then you have to activate the MiKtex package manager: Hold the "Win"-key
and click "R", then enter "mpm", look for "listings" and install ("+"-button in the
menu bar).
Now: no errors! ... and an output file has been written: simple_test_latex.pdf, which
could be viewed by the Foxit Reader just by pressing the View Output (F5) button of
the TeXnicCenter. Here the first few lines:

146
Performing Science with Open Source Software Chapter 13 Back to school

Similarly you would get a .ps file, if you choose LaTex --> PS as the output profile.

13.3 Three points and a plane in 3D


My first SymPy application: I wanted to get a function z = f(x,y) for calculating the
z-values over x- and y-coordinates, thus defining a plane in 3D given by three points
p, q, r. I needed this to eliminate a part of a 3D image, because it contained
structures which impaired the segmentation. Here the problem is illustrated:

147
Performing Science with Open Source Software Chapter 13 Back to school

Thus the starting point are three equations, from which we have to eliminate the
parameters s and t:
x = p1 + s*(q1 - p1) + t*(r1 - p1)
y = p2 + s*(q2 - p2) + t*(r2 - p2)
z = p3 + s*(q3 - p3) + t*(r3 - p3)

SymPy-wise, the elimination of s from these three equations is done by solving


them for s:
1 from __future__ import print_function
2 from sympy import *
3 from sympy.galgebra import xdvi, Get_Program, Print_Function
4 from sympy.galgebra import Format
5
6 def 3D-plane():
7 Print_Function()
8
9 x, p1, q1, r1, s, t = symbols('x p1 q1 r1 s t')
10 y, p2, q2, r2, s, t = symbols('y p2 q2 r2 s t')
11 z, p3, q3, r3, s, t = symbols('z p3 q3 r3 s t')
12
13 S1 = solve(p1 + s * (q1 - p1) + t * (r1 - p1) - x, s)
14 print('s =', S1[0])
15 S2 = solve(p2 + s * (q2 - p2) + t * (r2 - p2) - y, s)
16 print('s =', S2[0])
17 S3 = solve(p3 + s * (q3 - p3) + t * (r3 - p3) - z, s)
18 print('s =', S3[0])
19 return
20
21 def main():
22 Get_Program(True)
23 Format()
24 3D-plane()
25 xdvi('3D-plane_latex.tex')
26 return
27
28 if __name__ == "__main__":
29 main()

The main part of the program is written in the function 3D-plane. All the rest at
the beginning and at the end is needed to produce the .tex-output, and if all runs well
automatically a pdf-output is generated:

148
Performing Science with Open Source Software Chapter 13 Back to school

Eliminating t and isolating z can be done by inserting the following lines after
line 18:
19
20 T1 = solve(S1[0] - S2[0], t)
21 T2 = solve(S1[0] - S3[0], t)
22
23 Z = solve(T1[0] - T2[0], z)
24 # print('z =', Z[0])
25 A = collect(Z[0],x)
26 B = collect(A,y)
27 print('z=', B)
28 return

Note that I wrote S1[0] - S2[0] instead of S1 - S2 etc. ! Find out for
yourself, why.
The expression for z is quite long and nearly cracks the page rim. Of course one
could have easily made all these transforms manually with a pencil and some good
concentration, but errors might still happen. Here the programming gives some
confidence, that everything is correct. Collecting terms for x and y as done by
lines 25 and 26 improves the overview, but the remaining expression is still very
long. If we want to write this expression in two lines, we should modify the .tex file.
A very convenient way is to utilize the TeXnicCenter and open there the file
'3D-plane_latex.tex' which we created by xdvi in the script above. Now
you should see the latex code and somewhere the lines which contain the equation.
In the figure below is shown, what I added: a split operation marked green and
an alignment marked red for handsomeness. Note the two backslashes for the line
break and the &-sign for indicating the alignment.

149
Performing Science with Open Source Software Chapter 13 Back to school

Here is the result:

150
Performing Science with Open Source Software Chapter 13 Back to school

13.4 Bistability
Recently I had to prepare an introductory lecture on optical bistability and I wanted
to create an animation to demonstrate the switching behavior between such stable
states. As a prerequisite I needed a simplified model system and the equations
guiding this model. The schematic model is shown below: an optical resonator
formed by two mirrors contains a crystal with optical nonlinear properties, i.e. its
refractivity n changes nonlinearly with light intensity.

When a laser beam with intensity Iin is entering this resonator and the output Iout is
partly fed back to the crystal, then the optical path length n*d within the resonator is
altered due to the intensity dependent refractivity n(Ifb) of the crystal. The relation
between output- to input-intensity is the transmittance T, which depends on the
optical parameters of the resonator and it may be described by the so-called Airy
function:

with r and t being the reflectivity and transmittance of the resonator mirrors. Thus,
the transmittance of the whole system depends strongly on the phase of the wave
inside the resonator and the change of phase is directly related to the intensity
dependence of the crystal:

151
Performing Science with Open Source Software Chapter 13 Back to school

Inserting this phase-intensity relation into the Airy function gives an expression,
which is easily isolated for Iin:

... and we are able to draw a figure of the bistability characteristic:

When increasing the input light intensity Iin from zero, the system will at first follow
the green branch of the characteristic until the turning point of the "S"-curve. Since
the input intensity is still increasing, the system must "jump" to the upper stable
branch marked in red. Thus we observe a sudden switch of the output intensity.
When the input intensity is then decreased again, the system will follow the red
branch until it reaches the second turning point and Iout switches to the lower
branch.

152
Performing Science with Open Source Software Chapter 13 Back to school

While we are dealing with SymPy, we can easily determine the values of the
switching points:
6 def test():
7 r, t, p0, bt, ia = symbols('r t p0 bt ia')
8 phi, T, ie = symbols('phi T ie')
9 phi = bt*ia - p0
10 T = 1/(1+(r*phi**2)/(t**2))
11 print T
12 ie = ia/T
13 print ie
14 die = solve(diff(ie, ia), ia) # diff and solve for ia
15 print die
16 print ''
17 # fill in appropriate values for the parameters
18 ia0 = die[0].subs(bt, 0.9).subs(p0, 0.5).subs(t, 0.1).\
19 subs(r, 0.9)
20 ia1 = die[1].subs(bt, 0.9).subs(p0, 0.5).subs(t, 0.1).\
21 subs(r, 0.9)
22 print ia0
23 print ia1
24 Tn = T.subs(bt, 0.9).subs(p0, 0.5).subs(t, 0.1).\
25 subs(r, 0.9)
26 print ia0/Tn.subs(ia,ia0)
27 print ia1/Tn.subs(ia,ia1)
28 return

with the print-out:


1/(r*(bt*ia - p0)**2/t**2 + 1)
ia*(r*(bt*ia - p0)**2/t**2 + 1)
[(2*bt*p0*r - sqrt(bt**2*r*(p0**2*r - 3*t**2)))/(3*bt**2*r),
(2*bt*p0*r + sqrt(bt**2*r*(p0**2*r - 3*t**2)))/(3*bt**2*r)]

0.197972345138655
0.542768395602086
2.04335440563373
0.549238186958863

You may have recognized: we worked with some dirty tricks. Since the output
intensity is dependent on the input intensity and not the other way around, we should
have determined Iout = f(Iin).
And there is another reason, why I wanted the more complicated expression: My
plan was to create an animation of a working point moving on the characteristic,
while increasing Iin by constant steps per time. Well, I could have continued to use
the inverse function and determine respective interpolations for Iin for each time
step. But the cleaner solution is an expression for Iout as function of Iin.
Nothing easier than this with SymPy! Just two lines of code (I omitted r and t here to
show the principle without getting even more complicated:
a = solve((1 + (b*ia - p0)**2)*ia -ie, ia)
print('ia =', a[0])

153
Performing Science with Open Source Software Chapter 13 Back to school

But what an output! It cracks the page when latexed and I had to do the "split and
aligned"-trick like in last sub-chapter and even then had to tilt it to fit to this page:

154
Performing Science with Open Source Software Chapter 13 Back to school

I admit, I would have failed to determine this expression by myself. Thanks to


SymPy!!!
A bit of substitution makes it look a bit friendlier:

155
Performing Science with Open Source Software Chapter 13 Back to school

13.5 Do yourself symbolics!


Please read the very excellent user guide of SymPy and with that help examine the
examples given for beginners, intermediate and advanced freaks in 'Program
Files(x86)/pythonxy/doc/Libraries/sympy/examples'.
How about some vector algebra ("galgebra", c.f. Sub-chapter 5.9 in Chapter 5.
SymPy Modules Reference) or quantum mechanics... ?

156
Performing Science with Open Source Software

Optimistic optimization

CVXOPT

157
Performing Science with Open Source Software Chapter 14 Optimistic optimization

Chapter 14
Optimistic optimization
14.1 Aim: Looking for the best, considering
circumstances
Real life is full of constraints, which makes pessimists feel miserable. Optimists,
however, try to get the best out of it. CVXOPT is a wealthy toolbox to tackle
optimization. Just see the examples on its homepage, particularly from the book
"Convex Optimization" of Stephen Boyd and Lieven Vandenberghe:

This 730 pages thick book is available on the homepage as pdf. Together with the
other material on "programs(x86)/pythonxy/doc/Libraries/cvxopt" you have so much
information that my role here should only be, to make you aware of this fund. Try to
tackle some of the examples. You may learn a lot. let us make a tour through one of
them: centers.py (in: ...examples/book/chap8/). The rest is yours.

158
Performing Science with Open Source Software Chapter 14 Optimistic optimization

14.2 Looks simple, but ...


See the schematic of our example, the Chebyshev center. It is not very practical, but
illustrates nicely, what optimization within constraints is all about. The task is easily
explained: try to find the largest circle within the given boundaries.

Developing the code is less simple. How to "tell" the program, what to do?"
Fortunately we can learn from the given centers.py. We will focus only on the
first few lines:
13 # Extreme points (with first one appended at the end)
14 X = matrix([ 0.55, 0.25, -0.20, -0.25, 0.00, 0.40, 0.55,
15 0.00, 0.35, 0.20, -0.10, -0.30, -0.20, 0.00 ],\
16 (7,2))
17 m = X.size[0] - 1
18
19 # Inequality description G*x <= h with h = 1
20 G, h = matrix(0.0, (m,2)), matrix(0.0, (m,1))
21 G = (X[:m,:] - X[1:,:]) * matrix([0., -1., 1., 0.], (2,2))
22 h = (G * X.T)[::m+1]
23 G = mul(h[:,[0,0]]**-1, G)
24 h = matrix(1.0, (m,1))
25
26 # Chebyshev center
27 #
28 # maximize R
29 # subject to gk'*xc + R*||gk||_2 <= hk, k=1,...,m
30 # R >= 0
31
32 R = variable()
33 xc = variable(2)
34 op(-R, [ G[k,:]*xc + R*blas.nrm2(G[k,:]) <= h[k] for k in \
35 range(m) ] + [ R >= 0] ).solve()
36 R = R.value
37 xc = xc.value

159
Performing Science with Open Source Software Chapter 14 Optimistic optimization

... so few lines! A quarter of them comments, but still quite confusing at first sight.
The author of this example worked in a virtuoso manner with matrices, this is
another reason, why I selected this example. We may learn some elegant handling
techniques. Let us proceed step by step and print out the intermediate results, trying
to understand, what happened.

14.3 Step, print, understand ...


Lines 13-17 are still easy to understand. Printing m and X gives the following
information:

The matrix X contains the x- and y- coordinates of the corners, the first indicated in
red, the second in green, etc. As stated in the first comment: the last corner point is
appended at the end. Why? We will see soon. Then m = 6 stands for the number
of corners.
With line 20 two matrices are prepared, nothing exciting. More exciting is line 21.
What happens here? Let us investigate only the first part of the expression:
G = (X[:m,:] - X[1:,:]) * matrix([0., -1., 1., 0.], (2,2))
print X[:m,:] - X[1:,:]

160
Performing Science with Open Source Software Chapter 14 Optimistic optimization

Now it is clear, why matrix X contains 7 coordinate pairs. With X[:m,:] we


denote the first 6 coordinate pairs, with X[1:,:] the last 6. Subtracting yields a
matrix with six (∆x, ∆y)-pairs, i.e. we prepared for the gradients of the lines forming
the edges of the polyhedral.
What about the orthogonal matrix in the second part of the expression in line 21? It
just mirrors the columns and changes the signs of the new first column, leading to
the matrix G:

161
Performing Science with Open Source Software Chapter 14 Optimistic optimization

Line 22 is also a tricky one. Again, let us inspect only the first matrix multiplication:
G is multiplied by the transpose of X, i.e. by X.T. For the first two elements I have
indicated by color, where the different terms originate from:
h = (G * X.T)[::m+1]
print G*X.T

So, what is the result of the full expression h of line 22? The diagonal of G!

162
Performing Science with Open Source Software Chapter 14 Optimistic optimization

... and h has something to do with the offset a and the gradient of the edges.

Line 23 contains a strange expression as well: h[:,[0,0]]**-1. The column is


duplicated and the values are inverted:

163
Performing Science with Open Source Software Chapter 14 Optimistic optimization

Then the whole expression leads to a new G in preparation for a set of inequalities
with h=1.

Now everything is prepared for the optimization procedure itself. The task is
described in comments (lines 26-30). The next figure illustrates, what line 29 stands
for: it connects the parameters of the constraints with those of the circle.

164
Performing Science with Open Source Software Chapter 14 Optimistic optimization

The first equation in the red frame describes the line parallel to the first edge and
passing through the circle's center [yc,xc]. Now the trick: a* can be expressed
by R*||G||_2/∆x because the two triangles highlighted in the figure have the
same edge ratios. Finally the respective scaling, i.e. transforming all h-values to 1,
leads to the conditions for the optimization routine op in line 34.

The description of the function op can be found in the CVXOPT User's Guide in
the chapter "Modelling" under "Optimization Problems".
That's it!

165
Performing Science with Open Source Software Chapter 14 Optimistic optimization

14.4 Do it yourself
My recommendation: read the book mentioned above and try the different examples.
It is really worthwhile, because optimization problems are omnipresent and it is
good to know, how to tackle them.
If you dare to really work through that book, you will become an expert for
optimization which will be useful in many fields, from science, engineering,
economics, design, you name it.

166
Performing Science with Open Source Software

Are you certain?

Uncertainties

167
Performing Science with Open Source Software Chapter 15 Are you certain?

Chapter 15
Are you certain?
15.1 Aim: Quantifying uncertainties
Who likes uncertainties at all? No one I guess. However, sometimes it is helpful to
face reality, and in real life most things are not perfectly known. Uncertainty
calculations are not very popular as well and many try to avoid dealing with them.
But also here it is true, that for many circumstances it can be quite beneficial to
consider measurement uncertainties either to be more confident, more on the safe
side, or in order to optimize precision, or to analyze the prominent sources of
imprecision. For all this you need to quantify uncertainties and follow their
propagation from the initial factors which influence e.g. a device performance till the
final output characteristics and specifications. In the following we apply several
approaches to a model of a simplistic robot: a conventional approach, a matrix
formalism, Python's "Uncertainties" package, and a Monte Carlo scenario. Finally
we end up with the optimal working volume of the robot arm, where the
uncertainties of its positioning are below a given limit.

15.2 The robot

168
Performing Science with Open Source Software Chapter 15 Are you certain?

The preceding page shows, how the schematic robot looks like:
The robot can be rotated around the z-axis with angle a5, the base arm tilted against
the z-axis by angle a3, and the working arm tilted against the base arm by angle a4.
The lengths of the base arm a1 and of the working arm a2 can be varied as well.
The question is: how precise is the working point, i.e. the tip of the robot c
controllable for the values and uncertainties given in the table below?

parameters values std-dev


a1 150 cm 0.5 cm
a2 100 cm 0.3 cm
a3 30 deg 1 deg
a4 135 deg 1.5 deg
a5 30 deg 0.5 deg

... and the model is defined by this set of equations:

15.3 Frightening conventional


The classical approach of uncertainty analysis requests to calculate the partial
differential quotients for all parameters ai and to add their squares times the
respective variances (just to remember: variance is the square of the standard
deviation, the latter being the uncertainty). Taking the square root, leads to the
resultant uncertainties. I.e. in our case for b1:

169
Performing Science with Open Source Software Chapter 15 Are you certain?

This already frightens many students and they try to avoid this topic. Although, with
SymPy this should not be too difficult anymore. From what we learned in the
preceding chapter, we may write down the procedure as follows:

1 from __future__ import print_function


2 from sympy import *
3 from sympy.galgebra import xdvi, Get_Program, Print_Function
4 from sympy.galgebra import Format
5
6 def ob():
7 Print_Function()
8 a1, a2, a3, a4, a5, d = symbols('a1 a2 a3 a4 a5 d')
9 b1, b2, b3, c = symbols('b1 b2 b3 c')
10 deg = pi/180.0
11 d = a1*sin(a3) + a2*sin(a4 - a3)
12 print('d =', d)
13 b1 = d * cos(a5)
14 print('b_1 =', b1)
15 db1_da1 = diff(b1,a1)
16 v11 = db1_da1.subs(a3,21.0*deg).subs(a5,48.0*deg).evalf()
17 print('db_1/da_1=', db1_da1, '=', v11)
.
.
.
29 print('db_1/da_5=', db1_da5, '=', v15)
30 b1s = b1.subs(a1,101.0).subs(a2,47.0).subs(a3,21.0*deg)\
31 .subs(a4,137.0*deg).subs(a5,48.0*deg).evalf()
32 print('b_1 =', b1s)
33 sa1, sa2, sa3, sa4, sa5 = 0.5, 0.3, 1.0*deg, 1.5*deg, \
34 0.5*deg
35 sb1 = sqrt(db1_da1**2*sa1**2 + db1_da2**2*sa2**2 + \
36 db1_da3**2*sa3**2 + db1_da4**2*sa4**2 + \
37 db1_da5**2*sa5**2)
38 sb1 = sb1.subs(a1,101.0).subs(a2,47.0).subs(a3,21.0*deg)\
39 .subs(a4,137.0*deg).subs(a5,48.0*deg).evalf()
40 print('sb_1=', sb1)

C.f. the corresponding latex output (Note the automatic simplification! Compare
code line 13 with the second line shown below.):

170
Performing Science with Open Source Software Chapter 15 Are you certain?

The calculation for b2, b3, and c follows suite. If you have calculated everything
right, while disregarding covariances (see next sub-chapter), you should have
obtained the values shown in this table:

parameters values std-dev


b1 148.6 cm 2.6 cm
b2 85.8 cm 1.9 cm
b3 155.8cm 3.9 cm
c 231.8 cm 3.2 cm

15.4 Matrix formulation, more professional but even


more frightening
The equation for the uncertainty of b given in the last sub-chapter:
1

... could as well be written as follows:

171
Performing Science with Open Source Software Chapter 15 Are you certain?

where "t" is indicating matrix transposition, and

Variance is standard deviation squared, i.e.

Covariance is expressed as:

You recognize the difference? Covariances are a measure of correlation between the
parameters. This gives added quality to our uncertainty estimations. This may be
demonstrated for the uncertainty propagation from the bi to c.
Note the change in matrix Q. The diagonal elements are still equal to 1, while the
off-diagonal elements will have values between -1 and +1:

Finally:

172
Performing Science with Open Source Software Chapter 15 Are you certain?

The respective code:


sb = np.array([[sb1, sb2, sb3]])
Tc = np.array([[dcdb1,0,0], [0,dcdb2,0], [0,0,dcdb3]])
uc = Tc.dot(sb.T)

Qb = np.array([[1.0,0.0,0.0],[0.0,1.0,0.0],[0.0,0.0,1.0]])
Qb[0,1] = vb12/sb1/sb2
Qb[1,0] = vb12/sb1/sb2

Qb[0,2] = vb13/sb1/sb3
Qb[2,0] = vb13/sb1/sb3

Qb[1,2] = vb23/sb2/sb3
Qb[2,1] = vb23/sb2/sb3
print 'Qb:', Qb

vc = (uc.T).dot(Qb.dot(uc))[0][0]
sc = np.sqrt(vc)
print 'sc:', sc

Now the Q-matrix contains the covariances indicating the correlations. Note, that the
standard deviation for c is now smaller than calculated in the previous sub-chapter!
One would get the same result when setting the off-diagonal elements of Q equal to
zero, thus ignoring the correlations.

15.5 More intuitive: the Monte Carlo approach


This approach is so simple and as the figure below shows very intuitive. One may
ask, why producing all that sweat, as we have done in the previous sub-chapters.
Good question! Here is the simple code:
1 from numpy import *
2 Nobs = 1000
3 deg = pi/180.0
4
5 a1 = random.normal(150.0, 0.5, size=Nobs)
6 a2 = random.normal(100.0, 0.3, size=Nobs)
7 a3 = random.normal(30.0*deg, 1.0*deg, size=Nobs)
8 a4 = random.normal(135.0*deg, 1.5*deg, size=Nobs)
9 a5 = random.normal(30.0*deg, 0.5*deg, size=Nobs)

173
Performing Science with Open Source Software Chapter 15 Are you certain?

10 d = a1*sin(a3) + a2*sin(-a3+a4)
11 b1 = d * cos(a5)
12 b2 = d * sin(a5)
13 b3 = a1*cos(a3) - a2*cos(-a3+a4)
14 c = sqrt(b1**2 + b2**2 + b3**2)
15
16 print std(b1)
17 print std(b2)
18 print std(b3)
19 print std(c)
20 print mean(c)

... and the printed values agree beautifully with what we have calculated before:
2.61236000032
1.9360853283
4.00441976038
1.30717047749
231.772894551

In addition, when printing the results, one gets a nice illustration of the effect of the
correlations and the distribution of uncertainties.

174
Performing Science with Open Source Software Chapter 15 Are you certain?

15.6 Easy going and very fast


No headaches anymore with uncertainty budgets. Use Python's "Uncertainty"
package! See the relevant code for our problem: short and simple! ... and the results
appear very fast indeed.
1 import numpy as np
2 from uncertainties import ufloat
3 from uncertainties.umath import *
4
5 deg = np.pi/180.0
6
7 a1 = ufloat(150.0, 0.5)
8 a2 = ufloat(100.0, 0.3)
9 a3 = ufloat(30.0*deg, 1.0*deg)
10 a4 = ufloat(135.0*deg, 1.5*deg)
11 a5 = ufloat(30.0*deg, 0.5*deg)
12
13 p = a1*sin(a3) + a2*sin(-a3+a4)
14 b1 = p * cos(a5)
15 b2 = p * sin(a5)
16 b3 = a1*cos(a3) - a2*cos(-a3+a4)
17 print b1, b2, b3
18
19 c = sqrt(b1**2 + b2**2 + b3**2)
20 print c

175
Performing Science with Open Source Software Chapter 15 Are you certain?

... and the print-out:


148.6+/-2.6 85.8+/-1.9 155.8+/-3.9
231.8+/-1.3

Obviously, this algorithm considers the covariances as well!

15.7 Do it yourself!
The latter approach may be a good starting point to look for other locations in space
around the robot, to find out a proper working regime with acceptable precision. Try
to find out!

176
Performing Science with Open Source Software

Your own statistics

Statsmodels

177
Performing Science with Open Source Software Chapter 16 Your own statistics

Chapter 16
Your own statistics
16.1 Aim 1: Learning from R
By now, you should be experienced enough, to find out, where examples, tutorials
and tests for packages can be found. Thus, I should not reproduce them here.
Instead, I would like to introduce you to the golden standard of open source statistics
programming: 'R' (https://www.r-project.org/).

I will take you with me for a sample session of "translating" R-codes into Python's
'Statsmodels' -applications. You may already presume, that this is not just done in a
modified copy-and-paste-manner. I will show you exemplarily some pitfalls and
how to work around.

16.2 Aim 2: Learning from NIST/SEMATECH


Books on statistics are often quite mathematical and the examples and data used a bit
artificial or far-fetched. A nice and really practical guidance to real world problems
is this e-Handbook published by NIST and SEMATECH
(http://www.itl.nist.gov/div898/handbook/):
The examples in this handbook are written in R. Of course, you could learn R and
immediately get involved. However, if you prefer to stay in the Python environment
with all these other excellent tools, it should be really worthwhile to learn from these
two sources and connect the three worlds for your applications.

178
Performing Science with Open Source Software Chapter 16 Your own statistics

16.3 Preparatory action: involving 'pandas' and 'patsy'


An extremely versatile tool to condition data collections for efficient analysis is
'pandas' (https://pypi.python.org/pypi/pandas/0.18.0). Follow the respective guides
and tutorials to find out the strengths of this little helper. See, e.g., how we can
extract the data we need for our example below from the respective handbook
chapter. You just need to select the data with the mouse, right-click-copy to the
clipboard memory of your computer and then run the small code shown below. Then
the contents is represented by the variable name clipdf for further computational
manipulations.
Take care. This small program shown above is always only looking for the up to
date content of the clipboard. Thus, if you do any other cut or copy action in
between, then the next time you start this program, it will just enter the new content
of the temporary memory. It is better to save the data immediately in a file:
import pandas as pd

clipdf = pd.read_clipboard(index_col=0)
clipdf.to_csv("ceramics_all.csv")

179
Performing Science with Open Source Software Chapter 16 Your own statistics

The program, we will actually working with, will read these stored data at first and
then process them. For further utilization we need from 'patsy'
(https://pypi.python.org/pypi/patsy) the function dmatrices.
import pandas as pd
from patsy import dmatrices

df = pd.read_csv("ceramics_all.csv")

y, X = dmatrices('Y ~ Batch', data=df, return_type='dataframe')


print y[:4]
print X[:4]

... with the resulting print-out:


Y
0 608.781
1 569.670
2 689.556
3 747.541
Intercept Batch
0 1 1
1 1 2
2 1 1
3 1 2

180
Performing Science with Open Source Software Chapter 16 Your own statistics

16.4 The ceramics example


Let us plunge into the handbook and select this example:

Under point 1. "Background and Data" the origin and type of the data and the
purpose of the analysis are explained. This is followed by a table of the data for 480
observations out of 960 runs. These data I have read and saved to the file
ceramics_all.csv as shown in the previous sub-chapter. Point 2., the analysis
of the response variable is quite basic and could be reproduced by yourself using the
respective tools of 'matplotlib' and 'scipy'. Let us move to point 3., the analysis of the
batch effect is more interesting. Out of the few graphics let us reproduce the box plot
shown below (c.f. next page).

181
Performing Science with Open Source Software Chapter 16 Your own statistics

For this problem, the data had to be prepared differently, since the matplotlib
boxplot code needs a numpy array as data input. See the code below:
1 import matplotlib.pyplot as plt
2 import pandas as pd
3 from patsy import dmatrices
4 import numpy as np
5
6 df = pd.read_csv("ceramics_all.csv")
7 df1 = df[df['Batch'].isin([1])]
8 y1, X = dmatrices('Y ~ Batch', data=df1, return_type='matrix')
9 df2 = df[df['Batch'].isin([2])]
10 y2, X = dmatrices('Y ~ Batch', data=df2, return_type='matrix')
11
12 data = np.transpose(np.array([y1[:,0],y2[:,0]]))
13 fig = plt.figure()
14 ax = fig.add_subplot(111)
15
16 # plot box plot
17 bp = ax.boxplot(data, widths=0.5, sym='o')
18
19 # add labels
20 plt.setp(ax, xticks=[y+1 for y in range(2)],
21 xticklabels=['Batch 1', 'Batch 2'])
22 title=ax.set_title('Box Plot by Batch')
23 y_label=ax.set_ylabel('Ceramic Strength')
24 .
25 .
26 .

... plus a few lines (omitted here) to make the figure look nicer.

182
Performing Science with Open Source Software Chapter 16 Your own statistics

Still under point 3., but further below, look for: 'Quantitative techniques'. Let us
follow the link to the F-test:

This piece of R-code shows well the elegance of 'R'. The first bloc of lines just
prepares the data as we have done before. Then the single line
var.test(strength~batch)

performs the two-sided F-Test, which produces the output marked with '>'.

Please, follow me, how I tried to reproduce this with Python's Statsmodels.

183
Performing Science with Open Source Software Chapter 16 Your own statistics

16.5 How to reproduce?


First question: what could be the Pythonic equivalent to R's var.test()? This is
not so easy to answer. Looking for scipy.stats, I was not successful, although
the number of statistical functions collected there is overwhelming. May be I was
really overwhelmed and could not find the pin in the haystack. In statsmodels I
found the HetGoldfeldQuandt-method:

After some try and error I managed to write this code. Note the two alternative ways
to get the same result:
1 import pandas as pd
2 from patsy import dmatrices
3 import statsmodels.api as sm
4
5 df = pd.read_csv("ceramics_all.csv")
6 y, X = dmatrices('Y ~ Batch', data=df, \
7 return_type='dataframe')
8
9 HGQ = sm.stats.diagnostic.HetGoldfeldQuandt()
10 print HGQ.run(y,X,idx=1,split=None,drop=None, \
11 alternative='two-sided')
12
13 print sm.stats.diagnostic.het_goldfeldquandt(y,X,idx=1,\
14 split=None,drop=None,alternative='two-sided')

184
Performing Science with Open Source Software Chapter 16 Your own statistics

... with the print-out:


(0.89044219738453323, 0.37038293026947938, 'two-sided')
(0.89044219738453323, 0.37038293026947938, 'two-sided')

The first number should be the F-test and the second the p-value. When comparing
this with the results of the R-program (preceding page), then only the p-value seems
to be o.k. but not the F-test. The latter turned out to be 1.123 with the R-program.
What went wrong?

16.6 Back to the sources


In order to find out, how the two programs calculate the F-Test, we should compare
the respective source codes. I will not go into the depths of how to access the source
code for R-programs, but I refer to https://cran.r-project.org
/doc/Rnews/Rnews_2006-4.pdf on pages 43-45. By the way, this is a very
informative Journal on R and applied statistics!
Let us focus on finding the source code for HetGoldfeldQuandt(). It is on
your computer, for sure! But where? If you just use the Windows search function
and apply it to C:\ you may fail (at least I did). If you look along this path:
C:\Python27\Lib\site-packages\statsmodels\stats you can at
least find the program diagnostic.py. A closer look on its contents gives us the
hint, to follow the "sandbox"-path: C:\...\statsmodels\
sandbox\stats.
#collect some imports of verified functions
from statsmodels.sandbox.stats.diagnostic import (
acorr_ljungbox, breaks_cusumolsresid, breaks_hansen,
CompareCox, CompareJ, compare_cox, compare_j,
het_breushpagan, HetGoldfeldQuandt, het_goldfeldquandt,
het_arch, het_white, recursive_olsresiduals,
acorr_breush_godfrey, linear_harvey_collier,
linear_rainbow, linear_lm, unitroot_adf)

Here again you may find diagnostic.py, but this time a much more
comprehensive version with lots of functions and classes. Among them at line 752:
HetGoldfeldQuandt! ... and on line 827 the calculation of the F-test and the
two-sided p-value:
827 fval = resols2.mse_resid/resols1.mse_resid
.
.
836 elif alternative.lower() in ['2', '2-sided',
'two-sided']:
837 fpval_sm = stats.f.cdf(fval, resols2.df_resid,
resols1.df_resid)
838 fpval_la = stats.f.sf(fval, resols2.df_resid,
resols1.df_resid)
839 fpval = 2*min(fpval_sm, fpval_la)
840 ordering = 'two-sided'

185
Performing Science with Open Source Software Chapter 16 Your own statistics

Now, to make a long story short: look at the old function at lines 663 ff and here
especially at line 723:
723 fval = resols1.mse_resid/resols2.mse_resid

It is just the reciprocal! If we replace the last line of our program by:
result = sm.stats.diagnostic.het_goldfeldquandt(y,X,idx=1,\
split=None,drop=None,alternative='two-sided')
print "F-Test:", 1.0/result[0]
print "p-value:", result[1]

... then we obtain the same result as the R-program. Better: change the respective
line in class HetGoldfeldQuandt.

16.7 Do it yourself!
Try to continue...

186
Performing Science with Open Source Software

Publish or perish!

reportlab

187
Performing Science with Open Source Software Chapter 17 Publish or perish!

Chapter 17
Publish or perish!
17.1 Aim: documentation, publication, poster creation
Working with text is compulsory for a scientist. You need to communicate with your
colleagues, your professor, your scientific community and with our society if you do
not want just to work in an ivory tower. Recognition and success in your university
career, in your job or in business will depend on your ability to sell your scientific
achievements. Publish or perish. Nothing new for you.
"Nothing new!", you might think as well, when I try -in the following- to convince
you to learn how to use some alternative text processing tools. You are right. You
will feel much more comfortable with the tools you know. Producing texts,
publications, presentations, posters, and documents will be much more efficient with
those programs you are experienced with. The added value when using the new tools
may be negligible when considering the time, frustration and effort it takes to learn
these new text processing techniques.
So, why should you not skip this chapter? There are three main reasons:
• If you are dealing with lots of text documents that contain "structures" and you
want to search, extract, combine, bring to a certain order, link, highlight or
document, then programming text may be extremely helpful. Examples for
structured texts are address compilations, program scripts, literature reference
lists, user guides, technical documentations, etc., etc. Take the example of
references in scientific publications: it is annoying, but nearly every journal or
book editor requires different formats to cite literature. Rewriting the citations
may be manageable, if you are dealing with only a few, but when they become
many then a tool like re (regular expression) could be of very good value.
Regular expressions are embedded inside Python. A good tutorial may be found
in the Python documentation:

188
Performing Science with Open Source Software Chapter 17 Publish or perish!

• To my opinion there is no better way to learn advanced Python programming in


a hands-on manner than with the following example code for a huge real world
application: The generation of the ReportLab User Guide.

You may find this via the python(xy) documentation folder under 'Documentation
generation':

Investigating this program bundle will enable you to obtain an immediate experience
on object oriented programming. Any other comprehensive and complex software
bundle would require expert insight into the topics involved. But here nearly
everybody knows the 'objects': text, paragraphs, fonts, images, colors, etc.

189
Performing Science with Open Source Software Chapter 17 Publish or perish!

The second advantage: we have the resulting output of the program physically at
hand and can thus directly compare and verify what a certain code snippet does to
the output.

• Once you have managed to "reconstruct" the User Guide you have a template at
hand for a master thesis, a book, an ambitious document. Just replace the text
and figures, reshuffle and modify the layout and you are ready without the need
to invent the wheel again. Actually, this is the way I have generated this
document.
For short term profit, we will finalize in this chapter by generating a poster with
reportlab.

17.2 Collecting and debugging the codes


Nowadays one may find files of program code in the depths of the internet. You just
have to find them. The codes that make up the ReportLab User Guide max be
available at e.g. 'sourcearchive':

Download the code for genuserguide.py and paste it into a new file in the spyder
editor, save this and rename it to e.g. my_genuserguide.py and run it.

190
Performing Science with Open Source Software Chapter 17 Publish or perish!

Probably you will receive the following error message in your interpreter window:

The last line states: No module named tools.docco.rl_doc_utils and


the heading of the message informs that the bug appeared in line 27. Turning left
towards the editor window we see at line 27 (and in some lines below as well) the
name tools.docco.rl_doc_utils. Obviously this module could not be
imported because it does not (yet) exist in our computer. Fortunately,
rl_doc_utils.py is in the list of codes of Sourcearchive together with other
modules we will need to complete our software bundle. Thus look for download
reportlab-2.5/tools/docco/rl_doc_utils.py [code] and proceed
as before: download the code, paste it to a new file, save, rename and run ... and the
next error message appears: SyntaxError: invalid syntax.

Looking to line 227 in the editor reveals: something looks strange. Why does line
227 start with '00227'? If we eliminate this and left-adjust class
Illustration(figures.Figure): then this bug is fixed. But there are
similar bugs downstream the code. You have to fix all these. But take care to keep
the class and def statement properly aligned. Not always is left-adjust right,
sometimes you have to indent according to the local coding patterns.
After we have fixed all these syntax errors and run this module we face again our
meanwhile well-known ImportError about a next module missing.

191
Performing Science with Open Source Software Chapter 17 Publish or perish!

Ergo, we have to download rltemplate.py with the same procedures as before.


After fixing the syntax bugs running my_rltemplate.py should finish
successfully without error messages.
Now, going back to my_rl_doc_utils and changing in line 15
rltemplate.py to my_rltemplate.py will result into an error message
related to line 16 concerning the missing module stylesheet. Well, by now you
will know how to proceed... Those are the joys of debugging!
Hint: In my_t_parse.py you should 'comment' (using the '#') the tests for T1,
T2, and T3 in the function 'test'.
What you already recognize at this stage: the software bundle that generates the
reportlab User Guide is made up of several modules that refer to each other in a
certain hierarchical order. One module imports from another module the functions it
needs. The modules themselves are concerted collections of classes and functions
(class and def).
my_genuserguide.py is the top level module which binds all other modules
together and finally generates the guide. Thus, when you have done all the
debugging and renaming you should come back to this module. Then replace
tools.docco.rl_doc_utils by my_rl_doc_utils and change the
following lines, i.e. 'comment' all chapters except chapters 1 and 5.

for f in (
# 'ch1_intro',
'my_ch_1',
# 'ch2_graphics',
.
.
# 'ch5_paragraphs',
'my_ch5_paragraphs',
# 'ch6_tables',
.
.
# 'app_demos',
):
exec open_and_read(f+'.py',mode='t') in G, G

del G
story = getStory()
doc.multiBuild(story)
if verbose: print 'Saved "%s"' % destfn
.
.
.

You do not need to download all chapter codes, which would be fatiguing. Later you
could add them all, but for now, it's sufficient to obtain an incomplete guide because,
at present, we are only interested in the principle.

192
Performing Science with Open Source Software Chapter 17 Publish or perish!

If everything went right then your program "my_genuserguide.py" should -as


the name suggests- generate your version of the user guide. Now you are able to
directly cross-check the connection between code and result as indicated in the
figure below:

17.3 Building a pdf poster


While the previous chapter provided you with a complex and comprehensive
example program parcel which includes all kind of sample codes for various text and
graphic tasks, this chapter will introduce the basic building principles for generating
a document. In order to yield to something useful we will make a simple poster.
In the beginning: an empty page.

193
Performing Science with Open Source Software Chapter 17 Publish or perish!

Let us start with a few lines in the interactive mode:

First we have to decide upon the size of the page. For this we need some scales, i.e.
'cm' and/or 'inch' and standard page sizes like 'A4', 'letter', or later for the poster 'A0'.
The modules reportlab.lib.units and
reportlab.lib.pagesizes provide the objects needed. The few lines of code
shown above illustrate how to use those e.g. to calculate the page sizes of different
formats.
But how do we know, what modules, classes and functions are available in the
reportlab bundle? Well, first there is the user guide and the reference guide. Check
the python(x,y)-documentation. A quick approach for a clue is to consult the
auto-completion feature in IPython: typing 'from reportlab.' (watch the dot!)
will generate a pop-up window with a list of the modules. If you continue typing e.g.
'from reportlab.lib.' the respective pop-up window lists the sub-modules in
that module.

194
Performing Science with Open Source Software Chapter 17 Publish or perish!

Adding e.g. 'from reportlab.lib.pagesizes.' would not help you with


another window. Here 'import reportlab.lib.pagesizes' followed by a
next line 'reportlab.lib.pagesizes.' would show the complete collection
of page sizes available:
The basic building blocks
The following code contains the very essential tools for building a document: We
will discuss one after another.
1 """Reportlab: generating a poster"""
2 from reportlab.pdfgen import canvas
3 from reportlab.platypus import BaseDocTemplate, Frame, \
4 PageTemplate, Image
5 from reportlab.lib.units import cm
6 from reportlab.lib.pagesizes import A0
7
8 doc = BaseDocTemplate('poster.pdf',showBoundary = 1)
9 doc.pagesize = A0
10 myFrame = Frame(1.0*cm, 1.0*cm, 82.0*cm, 116.0*cm, \
11 id='mainFrame')
12 doc.addPageTemplates([PageTemplate(id = 'first_Page', \
13 frames = myFrame)])
14
15 story = []
16 imgFileName = "any_Logo.png"
17 img = Image(imgFileName)
18 w,h = img.wrap(0,0)
19 scale = 0.5
20 story.append(Image(imgFileName, w * scale, h * scale,))
21
22 doc.build(story, canvasmaker=canvas.Canvas)

At first the needed functions are imported from the different reportlab modules. You
will recognize that BaseDocTemplate, Frame, PageTemplate, etc. will show
up in the following lines of code.
The function BaseDocTemplate defines the properties to be valid for the whole
document, e.g. name, path, page size (cf. lines 8 and 9). It later collects the different
individual page templates. Here, in line 12, only a single page template is defined
and for this in turn only a single frame has been assigned. A frame, like the one in
line 10, is the area to be filled by content. So far the preparatory phase.
Next the actual story will be prepared. A story in this context is a python list. Thus
we start with an empty list (cf. line 15) and later we append the various components
of the story (line 20). In this example only one object is the story: an image.
Lines 16 to 20 demonstrate how to import an image file, extract the height and width
of the image, scale it appropriately and enter it into the story.
Finally, in line 22, everything is bound together and the pdf is created and saved
under the name and path annotated in line 8.

195
Performing Science with Open Source Software Chapter 17 Publish or perish!

Allowedly, this is only a very simple poster without much value. Next, one should
extend the code quite a bit, e.g. by adding text, math equations, etc., and constructing
several page templates with different layouts (frames). However, I prefer to show a
different application as a rounding off of this book:

17.4 How I prepared the cover of this book


As I have already mentioned, I generated this text book with Reportlab and in Create
Space (https://www.createspace.com/)
One of the more trickier exercises is to prepare the book cover. At Create Space a
guide called "How to make a basic cover" offers some useful help, but utilizes
InDesign of Adobe. This incited me, of course, to try to do the same job with an
open source tool and have chosen Reportlab. Here is my code:
1 from reportlab.pdfgen.canvas import Canvas
2 from reportlab.lib.styles import ParagraphStyle, StyleSheet1
3 from reportlab.lib.units import inch
4 from reportlab.platypus import Paragraph, Frame, Spacer
5 from reportlab.lib.enums import TA_CENTER, TA_JUSTIFY
6 from reportlab.lib import colors
7
8 def my_Styles():
9 stylesheet = StyleSheet1()
10 stylesheet.add(ParagraphStyle(name = 'Normal',
11 fontName = 'Times-Roman',
12 fontSize = 12,
13 leading = 12,
14 alignment = TA_JUSTIFY))
15 stylesheet.add(ParagraphStyle(name='Title',
16 parent=stylesheet['Normal'],
17 fontName = 'Helvetica-Bold',
18 fontSize=26,
19 textColor=colors.navy,
20 leading=30,
21 alignment=TA_CENTER))
22 stylesheet.add(ParagraphStyle(name='Subtitle',
23 parent=stylesheet['Title'],
24 fontSize=20,
25 leading=28,
26 spaceAfter=10))
27 stylesheet.add(ParagraphStyle(name='Author',
28 parent=stylesheet['Title'],
29 fontSize=16))
30 return stylesheet
31
32 styles = my_Styles()
33

So far, the preparatory steps: importing the reportlab modules needed and defining
the paragraph styles to be used for the abstract of the back cover and the title styles
for the front cover.

196
Performing Science with Open Source Software Chapter 17 Publish or perish!

Now come the instrinsic statements: First, I create a canvas of a page size according
to the Create Space guide "How to make a basic cover". Then - now as commented
lines - I draw auxiliary rectangles outlining the trim sizes of the two covers and one
for the spine, all three symmetrically to the midlines of the canvas. The spine width I
calculated following the guide.
34 c = Canvas('book-cover.pdf')
35 c.setPageSize((18*inch,12*inch))
36
37 #c.rect(2.761*inch, 1.5*inch, 6*inch, 9*inch, fill=0)
38 #c.rect(8.761*inch, 1.5*inch, 0.4788*inch, 9*inch, fill=0)
39 #c.rect(9.239*inch, 1.5*inch, 6*inch, 9*inch, fill=0)
40

The four "design"-lines and the two images are placed by canvas coordinates:
41 c.line(9.74*inch,10.0*inch,14.74*inch,10.0*inch)
42 c.line(9.74*inch,2.0*inch,14.74*inch,2.0*inch)
43 c.line(3.26*inch,10.0*inch,8.26*inch,10.0*inch)
44 c.line(3.26*inch,2.0*inch,8.26*inch,2.0*inch)
45
46 c.drawImage("my_py.png",3.4*inch,4.4*inch,width=80,height=80)
47 c.drawImage("title_pict.jpg",9.74*inch,6.0*inch,width=5.0*inch,\
48 height=250)
49

Finally, I create two frames fed by paragraphs as flowables and the text for the spine.
50 story_1 = []
51 txt = """Welcome to Python in Science!<br/>
52 A few short stories will introduce you ..."""
53 Par = Paragraph(txt, styles['Normal'])
54 story_1.append(Par)
55 f1 = Frame(3.26*inch,2.0*inch,5*inch,2.2*inch,showBoundary=0)
56 f1.addFromList(story_1,c)
57
58 story_2 = []
59 txt = "Performing Science <br/>with <br/>Open Source Software"
60 Par = Paragraph(txt, styles['Title'])
61 story_2.append(Par)
62 txt = "___<br/>Utilizing Python(x,y)"
63 Par = Paragraph(txt, styles['Subtitle'])
64 story_2.append(Par)
65 story_2.append(Spacer(0, 0.5*inch))
66 txt = "Hans Koch"
67 Par = Paragraph(txt, styles['Author'])
68 story_2.append(Par)
69 f2 = Frame(9.74*inch,2.0*inch,5*inch,3.5*inch,showBoundary=0)
70 f2.addFromList(story_2,c)
71
72 c.translate(9.1*inch,2.0*inch)
73 c.setFont("Helvetica-Bold", 14)
74 c.setFillColor(colors.navy)
75 c.rotate(90)
76 c.drawString(0.0*inch, 0.0*inch, "Hans Koch: Performing ...")
77
78 c.save()

197
Performing Science with Open Source Software Chapter 17 Publish or perish!

The result, with lines 37-39 un-commented:

17.5 Do it yourself!
Well, prepare your own book!

198
Performing Science with Open Source Software

Tools

199
Performing Science with Open Source Software Chapter 18 Table of tools

Chapter 18
Table of tools
chpt name home application
1 Python(x,y) python-xy.github.io/ toolbox
Anaconda continuum.io toolbox
2 Python python.org language
3 Spyder pypi/spyder workbench
4 Matplotlib matplotlib.org/ plotting
5 Mayavi enthought.com visualizing
7 Physionet physionet.org/ database
8 NumPy numpy.org arrays
9 VTK vtk.org visualizing
10 SciPy scipy.org science
11 ITK itk.org imaging
13 SymPy sympy.org algebra
14 CVXOPT cvxopt.org optimizing
15 Uncertaint. pypi/uncertainties uncertain
16 Statsmodels pypi/statsmodels statistics
R r-project.org statistics
Pandas pypi/pandas tables
17 Reportlab .reportlab.com documentation

200

You might also like