You are on page 1of 46

Licensed to:

mehmet yalcin
mehmet2193@yahoo.com
User #54222
CONTENTS

FEATURES
9 Creating custom 18 Working with IMAP
PyGTK widgets with and iCalendar
Cairo Doug Hellmann
Sayamindu Dasgupta
Integrate iCalendar functionality into
your Exchange-like groupware service.
Empower your users with the perfect
widget.

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


26 Processing Web 33 Extending Python
Forms Using John Berninger

Anonymous Using C to teach an old Python new


Functions & WSGI tricks, by hand!
Kevin T. Ryan

Lambdas are dandy, and WSGI is quicker.

COLUMNS
3| Import This
Welcome to the debut of Python Magazine 40| Welcome to Python
XML Processing with the (now built-in!)
ElementTree module

5| And now for something completely


different 45| Random Hits
The Python Community
Modern process management modules
alleviate GIL woes

Download this month’s code at: http://code.pythonmagazine.com/1/10

WRITE FOR US!


If you want to bring a Python-related topic to the attention of the professional Python community, whether it is personal research,
company software, or anything else, why not write an article for Python Magazine? If you would like to contribute, contact us
and one of our editors will be happy to help you hone your idea and turn it into a beautiful article for our magazine. Visit
www.pythonmagazine.com/c/p/write_for_us or contact our editorial team at write@pythonmagazine.com and get started!
E D ITO R IAL

>>>import this Volume 1 - Issue 10

Publisher
Marco Tabini

W
elcome to the premier issue of Python Magazine. Projects like a new
magazine tend to feel like a huge party. You put all kinds of time and
Editor-in-Chief
effort into it, you call decorators, caterers, chair rentals, balloon blow-
ers, clowns, and you get everything coordinated. Then, once everything is in Brian Jones
place, you pray that people actually show up.
I'm excited (and relieved!) to see that the community has figured out a Technical Editor
couple of things that have made them willing to embrace the idea of a maga- Doug Hellmann
zine devoted to Python.
First, we've done this before. Python Magazine is not the first magazine I've
Contributing Editor
helped launch, nor is it the first one that MTA (the publisher) has launched. We
are already intimately familiar with the problems inherent in trying to produce Steve Holden
a magazine that is timely, accurate, thorough, and even entertaining, on a

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


monthly basis, to a global audience. Further, we understand that the magazine Columnist
has to be seen as a value to the readership, or there won't be one. Mark Mruss
The second thing the community seems to have figured out is that our in-
tention is to help further the use of the language by helping to support the
Graphics & Layout
community and advocate the language using whatever resources we can make
available. Ideas are constantly flowing in. Talks are ongoing. Things are moving Arbi Arzoumani
forward, and it's exciting to see all of this taking place.
Managing Editor

Why this is happening Emanuela Corso

I guess I'm a bit of a workaholic, maybe. Truth is, I'm an infrastructure services
Authors
architect (a fancy sort of sysadmin) by day. Part of my job is to write code to do
various things and touch various services that I maintain. I had wanted to give John Berninger, Sayamindu Dasgupta,
Python a try for a while, and an opportunity presented itself. I took the plunge, Doug Hellmann, Steve Holden,
and fell in love. However, I found that one of my favorite learning resources Mark Mruss, Kevin T. Ryan
was unavailable in the Python world: the venerable how-to magazine!
These magazines exist for lots of other topics. There are magazines that'll tell
you how to brew beer, how to work with wood, how to play pool, how to cook,
Python Magazine is published twelve times a year
how to stay in shape, how to take pictures, how to write, and how to use your by Marco Tabini & Associates, Inc., 28 Bombay
computer in various ways (or how to use various computers in one particular Ave., Toronto, ON M3H1B7, Canada.
way, as the case may be). Heck, there's even a magazine on how to code in PHP!
For crying out loud, where's the Python mag?! Although all possible care has been placed in
assuring the accuracy of the contents of this
There wasn't one. It wasn't for lack of trying. Various attempts had failed to
magazine, including all associated source code,
gain momentum for whatever reason, but I wasn't going to let something silly listings and figures, the publisher assumes
like a path littered with the corpses of past failed attempts get in the way of no responsibilities with regards of use of the
having a magazine I could read to glean inspiration and knowledge from about information contained herein or in all associated
my new favorite programming language! material.
And so, I went to the publisher - the one I had worked with on php|architect.
Python Magazine, PyMag the Python Magazine
I told him that I wanted to learn Python better, and so did lots of other people,
logo, Marco Tabini & Associates, Inc. and the Mta
and that there was no magazine for them, and there was no magazine for me. Logo are trademarks of Marco Tabini & Associates,
I told him that people who already knew Python didn't know everything they Inc.
wanted to know, and there was no magazine for them, either. I shed a tear for
effect. Now, here we are, only 4 months after the initial "ok, go for it!", and For all enquiries, visit

Python Magazine is a reality. http://pythonmagazine.com/c/p/contact_us

So, in a way, it's happening because I want to know Python better. But it's
Printed in Canada

Copyright © 2003-2007
Marco Tabini & Associates, Inc.
3 • Python Magazine • October 2007 All Rights Reserved
E D ITO R IAL Import This

also happening because there are lots of people at vari- Meet the Rest of the Staff at
ous experience levels who would like to know how to do
something new, or something old in a better way, with PyMag
Python. It's happening because the community wants it
to happen as well. I got lots of positive feedback from
various IRC channels, emails to community members,
and even Guido himself.
Doug Hellmann is a Senior Software
The Premier Issue Engineer at Racemi. He has been
programming in Python since version
This first issue dives right in, and is made to look like 1.4 on a variety of Unix and non-Unix
we've done this before, rather than spending time dwell- platforms. He has worked on projects
ing on the fact that this is issue 1 of a new magazine. ranging from mapping to medical news
It's less interesting to people who've shown up for the publishing, with a little banking thrown in for good measure.
code to hear us big-headed editors patting ourselves on
the back.
However, being that this is a highly specialized maga-
zine, it's likely that I have a built-in, captive audience of

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


people with more than a passing interest in the language.
To you I'd like to extend an invitation to get involved in Steve Holden is a consultant, instructor
the community, the language, and the magazine. Join and author active in networking and
(or help start) a Python user group in your area, on your security technologies. He is Director of
college campus, or in your high school. Help others on the Python Software Foundation and a
the Python mailing lists. Further your skill by helping recipient of the Frank Willison Memorial
out an open source project. And, of course, write articles Award for services to the Python
for Python Magazine! community.
Let us know what you're doing with Python. Tell us
your story. We'd like to help share it with everyone!
Drop us a line at editors@lists.pythonmagazine.com, or go
to http://www.pythonmagazine.com/c/p/write_for_us to send
us an article proposal.
And if you're a reader and want to ask questions or For the last seven years
report a bug in an article or column you've seen in Py- Mark Mruss has worked as a software
thon Magazine, you can also send that to editors@lists. developer, programming in the much
maligned C++. In 2005 Mark decided
pythonmagazine.com, but you can also join us in our IRC
it was time to add another language
channel, #pymag, on irc.freenode.net. to his arsenal. After reading Eric
Enjoy this first issue, and welcome to Python Maga- Raymond's well known article "Why Python?" he set his
zine! sights on the inviting world of Python.

Since co-founding (and, some say,


printing) php|architect magazine
Brian Jones is a system/network/ on the back of a napkin in 2002,
database administrator who writes a good Arbi Arzoumani has been responsible
bit of Perl, PHP, and Python. He's the for making sure that letters are dotted
co-author of Linux Server Hacks, Volume and crossed in the appropriate places in
Two from O'Reilly publishing, founder of all of MTA's publications and web properties. Purveryour of fine
Linuxlaboratory.org, contributing editor food, sights and sounds at our conferences, his ultimate goal
at Linux.com, and, in a past life, worked as Editor in Chief of in life is to invent a non-lethal instrument that prevents the
php|architect Magazine. In his spare time, he enjoys brewing technical staff from playing designer without causing (too much)
beer, playing guitar and piano, writing, cooking, and billiards. permanent damage to their central nervous systems.

4 • Python Magazine • October 2007


CO L U MN

And Now
For So
methin
g
Completely
re n t
Diffe

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


by Doug Hellmann

Has your multi-threaded application grown GILs? Take a look at these packages
for easy-to-use process management and interprocess communication tools.

T
here is no predefined theme for this column, so I
plan to cover a different, likely unrelated, subject REQUIREMENTS
every month. The topics will range anywhere from
open source packages in the Python Package Index (for-
PYTHON: 2.5
merly The Cheese Shop, now PyPI) to new developments
from around the Python community, and anything that
looks interesting in between. If there is something you Other Software:
would like for me to cover, send a note with the details • Richard Oudkerk's "processing" package version
to doug.hellmann@pythonmagazine.com and let me know, 0.34 or higher http:pypi.python.org/pypi/processing
or add the link to your del.icio.us account with the tag
"pymagdifferent". • Vitalii Vanovschi's "parallel python" package
I will make one stipulation for my own sake: any open http:www.parallelpython.com/
source libraries must be registered with PyPI and config-
ured so that I can install them with distutils. Creating a Useful/Related Links:
login at http://pypi.python.org/ and registering your proj- • "It isn't Easy to Remove the GIL"
ect is easy, and only takes a few minutes. Go on, you http:www.artima.com/weblogs/viewpost.
know you want to. jsp?thread=214235
• "Can't we get rid of the Global Interpreter
Scaling Python: Threads vs. Lock?"
Processes http:www.python.org/doc/faq/library/#can-t-we-get-
rid-of-the-global-interpreter-lock
In the ongoing discussion of performance and scaling is-
sues with Python, one persistent theme is the Global In-

October 2007 • Python Magazine • 5


CO L U MN And Now For Something Completely Different

terpreter Lock (GIL). While the GIL has the advantage of nounced any plans to do so.
simplifying the implementation of CPython internals and Even though there is a FAQ entry on the subject as
extension modules, it prevents users from achieving true part of the standard documentation set for Python, from
multi-threaded parallelism by limiting the interpreter to time to time a request pops up on comp.lang.python or
executing byte-codes in one thread at a time on a single one of the Python-related mailing lists to rewrite the
processor. Threads which block on I/O or use extension interpreter so the lock can be removed. Each time it
modules written in another language can release the GIL happens, the answer is clear: use processes instead of
to allow other threads to take over control, of course. threads.
But if my application is written entirely in Python, only That response does have some merit. Extension mod-
a limited number of statements will be executed before ules become more complicated without the safety of the
one thread is suspended and another is started. GIL. Processes typically have fewer inherent deadlock-
ing issues than threads. They can be distributed be-
tween the CPUs on a host, and even more importantly,
an application that uses multiple processes is not lim-
"Parallel Python is ited by the size of a single server, as a multi-threaded
application would be.

impressive, but it is
Since the GIL is still present in Python 3.0, it seems
unlikely that it will be removed from a future version

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


any time soon. This may disappoint some people, but it

not the only option for is not the end of the world. There are, after all, strate-
gies for working with multiple processes to scale large
applications. I'm not talking about the well worn, es-
managing parallel jobs." tablished techniques from the last millennium that use
a different collection of tools on every platform, nor the
time-consuming and error-prone practices that lead to
solving the same problem time and again. Techniques
Eliminating the GIL has been on the wish lists of many using low-level, operating system-specific, libraries for
Python developers for a long time – I have been work- process management are as passe as using compiled lan-
ing with Python since 1998 and it was a hotly debated guages for CGI programming. I don't have time for this
topic even then. Around that time, Greg Stein produced low-level stuff any more, and neither do you. Let's look
a set of patches for Python 1.5 that eliminated the GIL at some modern alternatives.
entirely, replacing it with a whole set of individual locks
for the mutable data structures (dictionaries, lists, etc.)
that had been protected by the GIL. The result was an The subprocess module
interpreter that ran at roughly half the normal speed, Version 2.4 of Python introduced the subprocess module
a side-effect of acquiring and releasing the individual and finally unified the disparate process management
locks used to replace the GIL. interfaces available in other standard library packages to
The GIL issue is unique to the C implementation of provide cross-platform support for creating new process-
the interpreter. The Java implementation of Python, es. While subprocess solved some of my process creation
Jython, supports true threading by taking advantage of problems, it still primarily relies on pipes for interpro-
the underlying JVM. The IronPython port, running on cess communication. Pipes are workable, but fairly low-
Microsoft's CLR, also has better threading. On the other level as far as communication channels go, and using
hand, those platforms are always playing catch-up with them for two-way message passing while avoiding I/O
new language or library features, so if you're hot to use deadlocks can be tricky (don't forget to flush()). Pass-
the latest and greatest, like I am, the C reference-imple- ing data through pipes is definitely not as transparent
mentation is still your best option. to the application developer as sharing objects natively
Dropping the GIL from the C implementation remains between threads. And pipes don't help when the pro-
a low priority for a variety of reasons. The scope of the cesses need to scale beyond a single server.
changes involved is beyond the level of anything the
current developers are interested in tackling. Recently,
Guido has said he would entertain patches contributed Parallel Python
by the Python community to remove the GIL, as long Vitalii Vanovschi's Parallel Python package (pp) is a
as performance of single-threaded applications was not more complete distributed processing package that
adversely affected. As far as I know, no one has an- takes a centralized approach. Jobs are managed from

6 • Python Magazine • October 2007


CO L U MN And Now For Something Completely Different

a "job server, and pushed out to individual processing The package hides most of the details of selecting
"nodes". an appropriate communication technique for the plat-
Those worker nodes are separate processes, and can be form by choosing reasonable default behaviors at run-
running on the same server or other servers accessible time. The API does include a way to explicitly select
over the network. And when I say that pp pushes jobs the communication mechanism, in case I need that level
out to the processing nodes, I mean just that – the code of control to meet specific performance or compatibility
and data are both distributed from the central server to requirements. As a result, I end up with the best of both
the remote worker node when the job starts. I don't worlds: usable default settings that I can tweak later to
even have to install my application code on each ma- improve performance.
chine that will run the jobs. To make life even easier, the processing.Process class
Here's an example, taken right from the Parallel Py- was purposely designed to match the threading.Thread
thon Quick Start guide: class API. Since the processing package is almost a
drop-in replacement for the standard library's threading
import pp module, many of my existing multi-threaded applications
job_server = pp.Server()
# Start tasks can be converted to use processes simply by changing a
f1 = job_server.submit(func1, args1, depfuncs1, few import statements. That's the sort of upgrade path
modules1)
f2 = job_server.submit(func1, args2, depfuncs1,
I like.
modules1) Listing 1 contains a simple example, based on the ex-

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


f3 = job_server.submit(func2, args3, depfuncs2, amples found in the processing documentation, which
modules2)
# Retrieve the results passes a string value between processes as an argu-
r1 = f1() ment to the Process instance and shows the similarity
r2 = f2()
r3 = f3() between processing and threading. How much easier
could it be?
When the pp worker starts, it detects the number of CPUs In a few cases, I'll have more work to do to convert ex-
in the system and starts one process per CPU automati- isting code that was sharing objects which cannot easily
cally, allowing me to take full advantage of the comput- be passed from one process to another (file or database
ing resources available. Jobs are started asynchronous- handles, etc.). Occasionally, a performance-sensitive
ly, and run in parallel on an available node. The callable application needs more control over the communication
object returned when the job is submitted blocks until channel. In these situations, I might still have to get
the response is ready, so response sets can be computed my hands dirty with the lower-level APIs in the process-
asynchronously, then merged synchronously. Load dis- ing.connection module. When that time comes, they are
tribution is transparent, making pp excellent for clus- all exposed and ready to be used directly.
tered environments.
One drawback to using pp is that I have to do a little
more work up front to identify the functions and mod-
Sharing State and Passing Data
ules on which each job depends, so all of the code can For basic state handling, the processing package lets
be sent to the processing node. That's easy (or at least me share data between processes by using shared ob-
straightforward) when all of the jobs are identical, or use jects, similar to the way I might with threads. There
a consistent set of libraries. If I don't know everything are two types of "managers" for passing objects between
about the job in advance, though, I'm stuck. It would processes. The LocalManager uses shared memory, but
be nice if pp could automatically detect dependencies at the types of objects that can be shared are limited by a
runtime. Maybe it will, in a future version. low-level interface which constrains the data types and

The processing Package LISTING 1


1. #!/usr/bin/env python
Parallel Python is impressive, but it is not the only op- 2. # Simple processing example
3.  
tion for managing parallel jobs. The processing package 4. import os
5. from processing import Process, currentProcess
from Richard Oudkerk aims to solve the issues of creating 6.  
and communicating with multiple processes in a porta- 7. def f(name):
8.     print 'Hello,', name, currentProcess()
ble, Pythonic way. Whereas Parallel Python is designed 9.  
10. if __name__ == '__main__':
around a "push" style distribution model, the processing 11.     print 'Parent process:', currentProcess()
package is set up to make it easy to create producer/ 12.     p = Process(target=f, args=[os.environ.get('USER', 'Unknown user')])
13.     p.start()
consumer style systems where worker processes pull jobs 14.     p.join()
15.
from a queue.

October 2007 • Python Magazine • 7


CO L U MN And Now For Something Completely Different

sizes. LocalManager is interesting, but it's not what has especially since I don't have to decide up front what
me excited. The SyncManager is the real story. information to share or how big the values can be. Any
SyncManager implements tools for synchronizing in- process can change existing values or add new values to
terprocess communication in the style of threaded pro- the namespace, as illustrated in Listing 3. Changes to
gramming. Locks, semaphores, condition variables, and the contents of the namespace are reflected in the other
events are all there. Special implementations of Queue, processes the next time the values are accessed.
dict, and list that can be used between processes safe-
ly are included as well (Listing 2). Since I'm already
comfortable with these APIs, there is almost no learn- Remote Servers
ing curve for converting to the versions provided by the Configuring a SyncManager to listen on a network socket
processing module. gives me even more interesting options. I can start pro-
For basic state sharing with SyncManager, using cesses on separate hosts, and they can share data using
a Namespace is about as simple as I could hope. A all of the same high-level mechanisms described above.
namespace can hold arbitrary attributes, and any attri- Once they are connected, there is no difference in the
bute attached to a namespace instance is available in all way the client programs use the shared resources re-
client processes which have a proxy for that namespace. motely or locally.
That's extremely useful for sharing status information, The objects are passed between client and server us-
ing pickles, which introduces a security hole: because

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


unpacking a pickle may cause code to be executed, it
LISTING 2
is risky to trust pickles from an unknown source. To
1. #!/usr/bin/env python
2. # Pass an object through a queue to another process.
mitigate this risk, all communication in the processing
3.   package can be secured with digest authentication using
4. from processing import Process, Queue, currentProcess
5.   the hmac module from the standard library. Callers can
6. class Example:
7.     def __init__(self, name):
pass authentication keys to the manager explicitly, but
8.         self.name = name default values are generated if no key is given. Once the
9.     def __str__(self):
10.         return '%s (%s)' % (self.name, currentProcess()) connection is established, the authentication and digest
11.  
12.  
calculation are handled transparently for me.
13. def f(q):
14.     print 'In child:', q.get()
15.  
16.   Conclusion
17. if __name__ == '__main__':
18.     q = Queue() The GIL is a fact of life for Python programmers, and
19.     p = Process(target=f, args=[q])
20.     p.start() we need to consider it along with all of the other fac-
21.     o = Example('tester')
22.     print 'In parent:', o
tors that go into planning large scale programs. Both
23.     q.put(o) the processing package and Parallel Python tackle the
24.     p.join()
25. issues of multi-processing in Python head on, from dif-
ferent directions. Where the processing package tries
to fit itself into existing threading designs, pp uses a
LISTING 3
more explicit distributed job model. Each approach has
1. #!/usr/bin/env python
2. # Using a shared namespace.
benefits and drawbacks, and neither is suitable for every
3.   situation. Both, however, save you a lot of time over the
4. import processing
5.   alternative of writing everything yourself with low-level
6. def f(ns):
7.     print ns
libraries. What an age to be alive!
8.     ns.old_coords = (ns.x, ns.y)
9.     ns.x += 10
10.     ns.y += 10
11.  
12. if __name__ == '__main__':
13.     # Initialize the namespace
14.     manager = processing.Manager()
15.     ns = manager.Namespace()
16.     ns.x = 10 Doug Hellmann is a Senior Software
17.     ns.y = 20
18.   Engineer at Racemi. He has been
19.     # Use the namespace in another process programming in Python since version
20.     p = processing.Process(target=f, args=(ns,))
21.     p.start() 1.4 on a variety of Unix and non-Unix
22.     p.join() platforms. He has worked on projects
23.  
24.     # Show the resulting changes in this process ranging from mapping to medical news
25.     print ns publishing, with a little banking thrown in for good measure.
26.

8 • Python Magazine • October 2007


F E AT U R E

Creating custom PyGTK


widgets with Cairo

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


by Sayamindu Dasgupta

PyGTK, a set of Python bindings for the popular GTK+ graphical toolkit, provides
a rich collection of commonly used windows, dialog boxes, buttons, layout
elements, and other 'widgets'. However, often a programmer has needs which
go beyond the functionality provided by the built-in widgets in PyGTK. This
article explains how to create new widgets using the Python bindings for Cairo
– the vector graphics library used by GTK+ to perform most of its drawing
operations.

About GTK+
GTK+ is one of the most popular free/open source Graph- REQUIREMENTS
ical User Interface (GUI) toolkits around. Though best
known as the basic building block of GNOME, a popular PYTHON: 2.x
free/open source desktop, GTK+ was originally written
for the GIMP image editing program (in fact, 'GTK' actu-
Other Software: PyGTK 2.10 or above
ally stands for 'Gimp Tool Kit'). Currently, apart from
its role in the GNOME and GIMP projects, it is also used
to create the GUIs for the XFCE and Rox desktops. In Useful/Related Links:
addition, it is also used in embedded devices such as http://www.pygtk.org
the Nokia N800/N770 (as a part of the Hildon desktop), http://www.cairographics.org
and the FIC Neo1973 (as a part of the OpenMoko frame-
work).

October 2007 • Python Magazine • 9


F E AT U R E Creating custom PyGTK widgets with Cairo

Though written in C, GTK+ supports object oriented make converting a little less straightforward, but those
features using GObject and it also has an excellent set peculiar cases are usually few and far between.
of bindings for Python, known as PyGTK. In fact, Python
is fast becoming one of the primary languages of choice
for upcoming GNOME applications, as more and more de-
How Cairo fits into GTK+
velopers grow to love the language's simplicity and ease From version 2.8 onwards, GTK+ includes Cairo support,
of use. Some of the upcoming GNOME applications writ- making it possible for developers to access the Cairo
ten in Python include Sabayon (a user profile editor), drawing API directly from within GTK+. This means that
Jokosher (a multi-track audio editor), and Pitivi (a video GTK+ developers can use Cairo to draw their widgets us-
editor), to name just a few. Apart from GTK+, all other ing the Cairo API instead of using the GDK (GTK+ Draw-
major components of the GNOME Development Platform ing Kit) drawing functions. In fact, at present, most of
have Python bindings as well, a factor that also contrib- the stock GTK+ widgets and theme engines use Cairo to
utes to the adoption of Python within GNOME. do the rendering and drawing operations.

About Cairo Cairo basics


Another crucial building block of the GNOME develop- PyGTK/GTK+ is an oft used library, and there are lots of
ment platform is Cairo, a 2D graphics library with an API examples and documentation available, like the excel-

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


similar to the drawing operators offered by PostScript lent tutorial for PyGTK at http://pygtk.org/pygtk2tutorial/.
or PDF. Cairo was originally called Xr, though later the Cairo, on the other hand, is somewhat newer, so here's
name was changed to reflect the fact that the library was a small introduction to Cairo before moving on to the
not tied to the X windowing system only and supported PyGTK+Cairo part.
multiple output "backends" (PDF. PS, X Window System,
Image buffers, SVG, Win32 GDI, etc).
Though the library itself is written in C, excellent Py-
Cairo terminology
thon bindings for Cairo exist as well. Moreover, it is al- Cairo draws on a surface (or a destination) which can be
most trivial to convert a Cairo program written in C to your X Window System, a SVG file, a PDF file, or any of
Python. As documented on the Cairo website (http://www. the output target supported by Cairo. You can visualise
cairographics.org), in most cases, you can convert a Cairo the surface as a canvas on which you can paint using the
based program written in C to a Python program with Cairo drawing methods. Once you have the surface, you
just a couple of trivial steps that even beginners would can get a CairoContext object from that surface which
have no problem with. There are a few corner cases which keeps track of all drawing related variables and resources
as you progress. To do the actual "painting", you will
FIGURE 1 also need to set a source, which is like the paint for
your work. The source can be a single color, a gradient
or even a previously created surface. The source can also
contain alpha channel values (used to set the transpar-
ency).
To "transfer" the source to a surface, the fill() meth-
od is used. During this transfer, you may have a mask
which specifies exactly which areas the paint() method
will affect. The boundaries of the "holes" in the mask
are specified by paths which you can draw before calling
the fill() method. If you want to draw your paths only
(for example, if you have a triangular path and you want
to draw the outline of that triangle), you can use the
stroke() method to do so.

Drawing lines, curves and basic


shapes with Cairo
Assuming that ctx is your CairoContext object, the fol-
lowing code snippet will draw a straight line segment

10 • Python Magazine • October 2007


F E AT U R E Creating custom PyGTK widgets with Cairo

from the coordinate points (10, 10) to (120, 130). Note current point (175, 150) to the beginning of the current
that the origin point (0,0) of the surface is the top left sub-path, defined as the point passed to the last invo-
corner. The value of the X coordinate increases as you cation of move_to(). In this case, that's (50, 10). See
move from left to right, while the Y coordinate increases Figure 2 for the output of the above code. Drawing any
downwards. other polygon is a similar process (with the exception
of a rectangle, which has its own convenience method
ctx.move_to(10, 10)
rectangle(x0, y0, width, height) where (x0, y0) is the top
ctx.line_to(120, 130)
ctx.stroke() left corner of the rectangle).
Cairo lets you draw cubic Bézier curves with the meth-
The first line (ctx.move_to(10, 10)) begins a new sub-path od curve_to(x0, y0, x1, y1, x2, y2) where (x0, y0) and (x1,
and sets the current point to (10, 10). The second line y1) are the two control points, and (x2, y2) is the point
(ctx.line_to(120, 130)) draws a line from the current point where the curve ends. To draw arcs, use the method
(10, 10) to the point (120, 130). The third line (ctx. arc(x, y, radius, angle1, angle2) where (x, y) is the center
stroke()) actually makes the line visible (you can think of the arc, radius is the radius, and angle1 and angle2 are
of the stroke() method as something that drags a virtual the starting and end angles of the arc to be drawn. The
pen over your path). However, something that should angle is measured in radians, with angle 0 signifying the
be kept in mind while using the stroke() method: the direction of the positive X-axis, and angle 90 (math.pi/2)
path information is reset after the stroke operation is represents the direction of the positive Y-axis. The angle

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


completed. To avoid this, the stroke_preserve() method increases in a clockwise direction. So, for example, to
can be used instead. See Figure 1 for the result of the draw only the lower half of a circle centered at (50, 50)
above code. with a radius of 20, you will have to call the method ctx.
Once you have a way to draw straight lines, drawing arc(50, 50, 20, 0, math.pi). To draw the full circle instead,
any kind of polygon becomes an easy task. For example, the method would be ctx.arc(50, 50, 20, 0, 2*math.pi)
to draw a triangle, you only need to draw three straight
lines between the right coordinates:
Colors and text with Cairo
ctx.move_to(50, 10)
ctx.line_to(125, 150) As I mentioned before, the "paint" you'll be using with
ctx.line_to(175, 150) cairo is represented by a source. There are quite a few
ctx.close_path()
ctx.stroke()
methods that you can use to set the source before apply-
ing it. The one you'll probably be using most of the time
close_path() is the only new method in the above code is set_source_rgb() which lets you specify an rgb color
snippet. It is a shortcut which draws a line from the value for your source (the value of each color component

FIGURE 2 FIGURE 3

October 2007 • Python Magazine • 11


F E AT U R E Creating custom PyGTK widgets with Cairo

ranging from 0 to 1). So if you want to use the red color, custom PyGTK widget. Almost any custom widget that
you can use the method set_source_rgb(1, 0, 0) (since the is written is created as a subclass of a standard wid-
RGB value for red is 255, 0, 0). get (a gtk.TextView or a gtk.Window, for instance). So
You may also use the method set_source_rgba() to set the first line of a custom widget code looks like class
the alpha channel value for transparency. So if you want MyWidget(gtk.Window):.
fully transparent red (invisible), you would need to use The method you'll probably want to override is the
set_source_rgba(1, 0, 0, 0), for a red which is 75% opaque, do_expose_event() method, which is the event handler
you would use set_source_rgba(1, 0, 0, .75), and for a fully for an expose event. The expose event occurs when the
opaque red, you'll need to use set_source_rgba(1, 0, 0, 1). widgets that received the signal need to be redrawn for
To actually put the source on your destination surface, some reason. However, when you are writing a wid-
you will need to use the fill() method which fills up the get from scratch (a subclass of gtk.Widget), some other
area enclosed by your path with the source. So if you methods need to be taken care of. The most important
want a red triangle (Figure 3), your code would be of these methods are:

ctx.move_to(50, 10) • do_realize(): This method takes care of creat-


ctx.line_to(125, 150)
ctx.line_to(175, 150)
ing the window (and related resources) for
ctx.close_path() the widget, if required. The 'do_unrealize()'
ctx.set_source_rgb(1, 0, 0) method does just the opposite and frees the

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


ctx.fill()
window.
Note that the path information gets reset after fill() is • do_size_request(): This method handles the
called. To avoid this behaviour, you may call fill_pre- request by GTK+ asking the widget for its size.
serve() instead. If you want to fill up your entire surface Note that it isn't guaranteed that the size
instead of the area within your path, you can use the requested will be granted.
paint() method which will transfer the entire source to • do_size_allocate(): This method is called when
the surface, regardless of the path that you may have the actual size allocated to the widget is
created earlier. known. Apart from saving the size allocated to
For drawing basic text, show_text () is the method you the widget and computing the size of compo-
should use. However, for most text operations you should nents, this method also takes care of moving
probably create a PangoLayout instead. To display it, use the widget's window to the new position (and
the update_layout() method followed byshow_layout() to also resizing it if required).
actually show the text. Using PangoLayout will give you • do_draw(): This method is called when the wid-
a lot more flexibility in terms of text appearance and get is drawn on the screen for the first time.
This method by default generates artificial
formatting, and moreover, it will also let you support
expose-events, and normally there is no need
complex scripts like Arabic or Indic in your rendering.
to change that behaviour, (that is, override
the method) unless you are doing something
The anatomy of a PyGTK widget really complicated.
Moving on from Cairo (for the time being) to PyGTK, let
Apart from these, a widget may have several custom sig-
us start off with an analysis of the internals of a typical
nals which are emitted when a particular event occurs,
and they almost always have extra properties as well.
FIGURE 4
Both of these are managed via GObject. Using GObject
in Python provides quite a few advantages, including
support for signals, type checking for properties, moni-
toring of properties for change of values, etc.

An example of a typical custom


PyGTK widget
Let us now look at a typical example of a PyGTK widget
to understand how the various parts fit together. We will
look into an OSD (On Screen Display) widget which pops
up on the desktop when the user presses the volume

12 • Python Magazine • October 2007


F E AT U R E Creating custom PyGTK widgets with Cairo

control buttons on her keyboard (Figure 4). It supports into the code. The first few lines are usual Python stuff.
a semi-transparent background, which is one of those Apart from gtk, gdk, cairo and gobject, we also import
things that has become very common and very easy to the random module as we will use this for a demo of the
use since the advent of Cairo and other Xorg technolo- widget once we are done. We will also be using a few
gies such as the COMPOSITE extension. However, it also features specific to PyGTK 2.10 and above, so we do a
gracefully degrades to a solid black background if the check for the version we are using (if gtk.pygtk_version
user is running it in an environment where translucency < (2,10,0):). We want our OsdWindow widget to be a sub-
is not possible. Note that, if you wanted, it is possible class of gtk.Window, so we declare our class with class
to make an entire gtk.Window translucent with the set_ OsdWindow(gtk.Window):).
opacity() method, but in this case, only the background
needs to be translucent. The widget also shows an icon
for a speaker and a white bar to indicate the volume Dealing with signals
level. Since the widget is a special form of GTK window
__gsignals__ = {
'expose-event': 'override',
'screen-changed': 'override',

"Cairo draws on a surface 'clicked' : (gobject.SIGNAL_RUN_LAST,


gobject.TYPE_NONE,
())

which can be your X Window


}

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


The __gsignals__ dictionary has all the signals we would
System, a SVG file, a PDF file, want to deal with. The key of each pair in the dictionary
is the name of the signal. We override the handlers for

or any of the output target the built in class-specific callbacks for expose-event and
screen-changed, so we set the value of the relevant pairs
to override.
supported by Cairo. " Note: In most of the PyGTK related documentation,
you may see the above process as referred to as "over-
riding the class closures". In very simplified terms, a
(one without any decoration, and a custom background), closure is an abstraction of the callback concept and it
we'll make it a subclass of a a gtk.Window. Since we contains, along with the callback function, related stuff
want to change the look and feel of the widget, we will such as user data supplied to the callback, etc. When a
be overriding the signal handler for the expose event signal is emitted, a series of closures are emitted, one
(the do_expose() method). The widget will also have a of them being specific to the class and hence known as
property called level which will be used to set the length the class closure.
of the volume level indicator bar. Moreover, contrary to Finally, the third signal (clicked) is something that we
the behaviour of a normal gtk.Window, our window will define on our own, and the value part of the pair is a
also emit a 'clicked' signal if someone clicks on it. This tuple containing the following members:
will allow the developer who is using the widget to close
it (or do something even fancier) if the user clicks on • gobject.SIGNAL_RUN_LAST: This is the signal
the window. flag, determining when the class closure for
the signal would be invoked. SIGNAL_RUN_LAST
A walkthrough of the code indicates that the invocation should be during
the third stage of signal emission. For more
information on this, you can check out the
import random
import pygtk
signals section of the Gobject manual at http://
pygtk.require('2.0') developer.gnome.org/doc/API/2.0/gobject/signal.
import gtk html.
from gtk import gdk
import cairo • gobject.TYPE_NONE: This signal does not return
import gobject anything, so the second value is set to gobject.
if gtk.pygtk_version < (2,10,0): TYPE_NONE
print 'PyGtk 2.10.0 or later required' • The third value is an empty tuple. This tuple is
raise SystemExit
supposed to contain all the parameters to the
So, without any more boring theory, let us dive straight signal. We have none, so the tuple is empty.

October 2007 • Python Magazine • 13


F E AT U R E Creating custom PyGTK widgets with Cairo

Once you define a custom signal for yourself, you can and do_set_property() be defined. These are called when-
emit the signal whenever you want with the emit() meth- ever someone tries to access these properties. For our
od (we will be coming back to this later in the code). example, the methods are described in Listing 1.
Note: When you have property names with more than
Dealing with properties one word, GObject translates the - (hyphen) to _ (under-
score) and vice versa. So a property representing a Py-
thon variable "update_speed" would be translated into
__gproperties__ = {
'level': (gobject.TYPE_FLOAT,
"update-speed" by GObject. This is something that you
'OSD level', should keep in mind while working on your code.
'value for the OSD level indicator',
0, 1, 0.5, gobject.PARAM_READWRITE)
}
The constructor
The properties of our widget are specified via the __gprop- The constuctor for our widget (Listing 2) takes the ini-
erties__ dictionary. We need to have only one property tial OSD level as an argument. It calls the constructor
called level which is specified as the key of the first (and of the widgets superclass (gtk.Window) and sets the
only) pair in the dictionary. The value is a tuple contain- type to gtk.WINDOW_POPUP so that the window man-
ing the following members: ager will not register our window. Thus the window will
remain undecorated, it will not appear in the panel and

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


• gobject.TYPE_FLOAT: This specifies the type of users will not be able to resize/move it. We also call
the property. Other types can be TYPE_STRING, set_app_paintable(True) so that we can draw the window
TYPE_INT, TYPE_BOOLEAN, TYPE_PYOBJECT, etc. background ourselves, instead of GTK+ painting it in
• The second value is a short string describing the opaque color specified in our theme. The position
the property. of the window is set to gtk.WIN_POS_CENTER, so that
• The third value is a larger description of the it appears in the middle of the screen. We also call self.
property. do_screen_changed() which sets up the widget for the
• The fourth value specifies the minimal value of current screen. To make the widget emit the "clicked"
the property. Note that this is only valid for
signal when clicked, first we add the BUTTON_PRESS_
certain types of properties such as TYPE_INT,
MASK to the event mask for our widget and then we
TYPE_FLOAT, etc. In other cases, it may either
setup the widget to emit the signal when a button press
refer to the default value of the property or
is received. These (in addition to the event mask and
the property flags (explained later).
emitting of the signal) are done via the following code
• The fifth value specifies the maximum value of
the property. Again, as above, this is appli- self.add_events(gtk.gdk.BUTTON_PRESS_MASK)
cable only for certain types of properties. self.connect('button-press-event',
lambda x, y: self.emit('clicked'))
• The sixth value specifies the default value of
the property in our case.
• The seventh value specifies the property flags LISTING 1
which is set to gobject.PARAM_READWRITE. 1. def do_get_property(self, property):
2.      if property.name in self.__gproperties__:
Other flags include PARAM_CONSTRUCT, indi- 3.          return getattr(self, property.name)
cating that the property would be set during 4.      else:
5.          raise AttributeError, 'unknown property %s' % property.name
object construction, PARAM_CONSTRUCT_ONLY, 6.  
7.  def do_set_property(self, property, value):
indicating that the property would be set dur- 8.      if property.name in self.__gproperties__:
ing object construction only, PARAM_LAX_VALI- 9.          return setattr(self, property.name, value)
10.      else:
DATION, indicating that strict validation is not 11.          raise AttributeError, 'unknown property %s' % property.name

required for handling the values, PARAM_READ-


ABLE, which indicates that property is readable LISTING 2
and PARAM_WRITABLE which should be used if
1. def __init__(self, level=0.5):
the property is writeable. Note that property 2.     gtk.Window.__init__(self, type=gtk.WINDOW_POPUP)
3.     self.set_app_paintable(True)
can be flags can be combined (eg: gobject. 4.     self.set_position(gtk.WIN_POS_CENTER)
PARAM_CONSTRUCT | gobject.PARAM_WRIT- 5.    
6.  
self.do_screen_changed()

ABLE indicates that the property is writeable 7.     self.add_events(gtk.gdk.BUTTON_PRESS_MASK)


8.     self.connect('button-press-event',
and it is set during object construction). 9.         lambda x, y: self.emit('clicked'))
10.  
11.     self.level = level
Gobject requires that two methods, do_get_property()

14 • Python Magazine • October 2007


F E AT U R E Creating custom PyGTK widgets with Cairo

LISTING 3 And in the final step of our constructor, we set the level
property of our widget.
1. def do_expose_event(self, event):
2.     ctx = self.window.cairo_create()

The expose event handler


3.     alpha = self.supports_alpha and self.is_composited()
4.  
5.     ctx.rectangle(event.area.x, event.area.y,
6.         event.area.width, event.area.height)
7.     ctx.clip() The do_expose_event() method (Listing 3) is called when
8.    
9.    
ctx.set_operator(cairo.OPERATOR_SOURCE)
if alpha:
the widget gets an 'expose-event' (ie, when all or part of
10.         ctx.set_source_rgba(1.0, 1.0, 1.0, 0.0) the widget needs to be redrawn for some reason). Before
11.     else:
12.         ctx.set_source_rgb(1.0, 1.0, 1.0) anything else, We get a CairoContext for our widget by
13.    
14.    
ctx.paint()
invoking self.window.cairo_create() and we would be us-
15.     x0 = event.area.x ing this CairoContext for all of our draw operations. The
16.     y0 = event.area.y
17.     width = event.area.width ctx.rectangle() method is used to figure out the exact
18.    
19.    
height = event.area.height
x1 = x0 + width
region that would be affected by our drawing operations.
20.     y1 = y0 + height Once this region is determined, the ctx.clip() call masks
21.     radius = 40
22.     ctx.move_to(x0, y0+radius) out all other possible parts of the surface that sit out-
23.    
24.    
ctx.curve_to(x0, y0+radius, x0, y0, x0+radius, y0)
ctx.line_to(x1-radius, y0) # Top line segment
side this region. For our drawing, we want to operate in
25.     ctx.curve_to(x1-radius, y0, x1, y0, x1, y0+radius) normal "source" mode, and not in "over" or "xor" mode
26.     ctx.line_to(x1, y1-radius) # Right line segment
27.     ctx.curve_to(x1, y1-radius, x1, y1, x1-radius, y1) (where drawing the same thing twice will have the same

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


28.    
29.    
ctx.line_to(x0+radius, y1) # Bottom line segment
ctx.curve_to(x0+radius, y1, x0, y1, x0, y1-radius)
effect as deleting it), so we set the compositing operator
30.     ctx.close_path() # Left line segment to cairo.OPERATOR_SOURCE.
31.     if alpha:
32.         ctx.set_source_rgba(0, 0, 0, 0.5) Although we want translucency in our widget, there
33.    
34.    
else:
    ctx.set_source_rgb(0, 0, 0)
are a lot of systems out there which do not have trans-
35.     ctx.fill() lucency (due to older software, or sometimes because
36.  
37.     x0 = event.area.width/4 the user has chosen to disable translucency). Hence, we
38.    
39.    
y0 = event.area.height/3
width = event.area.width/2
need to gracefully handle (or at least try to handle) sys-
40.     height = event.area.height/2 tems which do not have support for tranclucency. In or-
41.     ctx.set_line_width(5)
42.     ctx.move_to(x0, y0) der to do that, we perform some checks in our do_screen_
43.    
44.    
ctx.rel_line_to(width/2, 0)
ctx.rel_line_to(width/3, -height/4)
changed() method to figure out if our Xserver supports
45.     ctx.rel_curve_to(0, 0, height/3, RGBA visuals and set the value of supports_alpha accord-
46.         height/2, 0, height)
47.     ctx.rel_line_to(-width/3, -height/4) ingly. However, even if the Xserver supports RGBA visu-
48.    
49.    
ctx.rel_line_to(-width/2, 0)
ctx.close_path()
als, a compositing manager may not be running in the
50.     ctx.set_source_rgb(1, 1, 1) system, and in such a situation, we cannot rely on the
51.     ctx.stroke()
52.   alpha channel being drawn correctly on the screen. For
53.  
54.     x0 = event.area.width/8
such a situation, the gtk.Widget class in PyGTK 2.10 and
55.     y0 = event.area.height - event.area.height/8 above provides a method called is_composited() which
56.     length = event.area.width - event.area.width/8 - x0
57.   returns True if a compositing manager is running for the
58.    
59.    
ctx.set_line_width(10)
ctx.set_dash((10, 5), 0)
screen on which the widget is being displayed. We use
60.     ctx.move_to(x0, y0) the values for both supports_alpha and that returned by
61.     ctx.line_to(length*self.level+x0, y0)
62.     ctx.stroke() is_composited() to figure out whether we should be using
alpha channels and set the source accordingly.

if self.supports_alpha and self.is_composited():


LISTING 4 ctx.set_source_rgba(1.0, 1.0, 1.0, 0.0)
else:
1. def clicked_cb(window): ctx.set_source_rgb(1.0, 1.0, 1.0)
2.     print "Exiting..." ctx.paint()
3.     gtk.main_quit()
4.  
5. def update(window): If alpha channel is supported, we paint the entire back-
6.     window.set_level(random.random())
7.     return True
ground of the widget to be transparent (with a alpha
8.   value of 0.0).
9. if __name__ == '__main__':
10.     window = OsdWindow(level=0.75) Next, we move on to draw the base image for our wid-
11.     window.connect('delete-event', gtk.main_quit)
12.     window.connect('clicked', clicked_cb)
get, a translucent rectangle. However, we cannot use
13.     window.show() the rectangle() method here, since we want our rectangle
14.  
15.     gobject.timeout_add(1000, update, window) to have rounded corners. So we create a path for such a
16.  
17.     gtk.main()
rectangle using successive calls to line_to() and curve_to()

October 2007 • Python Magazine • 15


F E AT U R E Creating custom PyGTK widgets with Cairo

LISTING 5 LISTING 5
87.         height = event.area.height
1. #!/usr/bin/env python
88.         x1 = x0 + width
2.  
89.         y1 = y0 + height
3. # Demonstration of custom PyGTK widgets
90.         radius = 40
4. # Author: Sayamindu Dasgupta
91.         ctx.move_to(x0, y0+radius)
5.  
92.         ctx.curve_to(x0, y0+radius, x0, y0, x0+radius, y0)
6. import random
93.         ctx.line_to(x1-radius, y0) # Top line segment
7.  
94.         ctx.curve_to(x1-radius, y0, x1, y0, x1, y0+radius)
8. import pygtk
95.         ctx.line_to(x1, y1-radius) # Right line segment
9. pygtk.require('2.0')
96.         ctx.curve_to(x1, y1-radius, x1, y1, x1-radius, y1)
10. import gtk
97.         ctx.line_to(x0+radius, y1) # Bottom line segment
11. from gtk import gdk
98.         ctx.curve_to(x0+radius, y1, x0, y1, x0, y1-radius)
12. import cairo
99.         ctx.close_path() # Left line segment
13. import gobject
100.         if alpha:
14.  
101.             ctx.set_source_rgba(0, 0, 0, 0.5)
15.  
102.         else:
16. if gtk.pygtk_version < (2,10,0):
103.             ctx.set_source_rgb(0, 0, 0)
17.     print 'PyGtk 2.10.0 or later required'
104.         ctx.fill()
18.     raise SystemExit
105.  
19.  
106.         x0 = event.area.width/4
20. class OsdWindow(gtk.Window):
107.         y0 = event.area.height/3
21.     __gsignals__ = {
108.         width = event.area.width/2
22.         'expose-event':   'override',
109.         height = event.area.height/2
23.         'screen-changed': 'override',
110.         ctx.set_line_width(5)
24.         'clicked' : (gobject.SIGNAL_RUN_LAST,
111.         ctx.move_to(x0, y0)
25.             gobject.TYPE_NONE,
112.         ctx.rel_line_to(width/2, 0)
26.             ())
113.         ctx.rel_line_to(width/3, -height/4)
27.     }

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


114.         ctx.rel_curve_to(0, 0, height/3,
28.  
115.             height/2, 0, height)
29.     __gproperties__ = {
116.         ctx.rel_line_to(-width/3, -height/4)
30.         'level':    (gobject.TYPE_FLOAT,
117.         ctx.rel_line_to(-width/2, 0)
31.             'OSD level', 'value for the OSD level indicator',
118.         ctx.close_path()
32.             0, 1, 0.5, gobject.PARAM_READWRITE)
119.         ctx.set_source_rgb(1, 1, 1)
33.     }
120.         ctx.stroke()
34.  
121.  
35.     def __init__(self, level=0.5):
122.  
36.         gtk.Window.__init__(self, type=gtk.WINDOW_POPUP)
123.         x0 = event.area.width/8
37.         self.set_app_paintable(True)
124.         y0 = event.area.height - event.area.height/8
38.         self.set_position(gtk.WIN_POS_CENTER)
125.         length = event.area.width - event.area.width/8 - x0
39.         self.do_screen_changed()
126.  
40.  
127.         ctx.set_line_width(10)
41.         self.add_events(gtk.gdk.BUTTON_PRESS_MASK)
128.         ctx.set_dash((10, 5), 0)
42.         self.connect('button-press-event',
129.         ctx.move_to(x0, y0)
43.             lambda x, y: self.emit('clicked'))
130.         ctx.line_to(length*self.level+x0, y0)
44.  
131.         ctx.stroke()
45.         self.level = level
132.  
46.        
133.     def do_screen_changed(self, old_screen=None):
47.     def do_get_property(self, property):
134.         screen = self.get_screen()
48.         if property.name in self.__gproperties__:
135.         colormap = screen.get_rgba_colormap()
49.             return getattr(self, property.name)
136.         if colormap:
50.         else:
137.             self.supports_alpha = True
51.             raise AttributeError, 'unknown property %s' % property.name
138.         else:
52.  
139.             colormap = screen.get_rgb_colormap()
53.     def do_set_property(self, property, value):
140.             self.supports_alpha = False
54.         if property.name in self.__gproperties__:
141.         self.set_colormap(colormap)
55.             return setattr(self, property.name, value)
142.  
56.         else:
143.  
57.             raise AttributeError, 'unknown property %s' % property.name
144. def clicked_cb(window):
58.  
145.     print "Exiting..."
59.     def _set_level(self, level):
146.     gtk.main_quit()
60.         self._level = level
147.  
61.         if self.window:
148. def update(window):
62.             alloc = self.get_allocation()
149.     window.level = random.random()
63.             rect = gdk.Rectangle(alloc.x, alloc.y,
150.     return True
64.                 alloc.width, alloc.height)
151.  
65.             self.window.invalidate_rect(rect, True)
152. if __name__ == '__main__':
66.             self.window.process_updates(True)
153.     window = OsdWindow(level=0.75)
67.     # Enforce how the level is set, so it doesn't change without updating the UI
154.     window.connect('delete-event', gtk.main_quit)
68.     level = property(lambda self: self._level, _set_level)
155.     window.connect('clicked', clicked_cb)
69.  
156.     window.show()
70.     def do_expose_event(self, event):
157.  
71.         ctx = self.window.cairo_create()
158.     gobject.timeout_add(1000, update, window)
72.         alpha = self.supports_alpha and self.is_composited()
159.  
73.  
160.     gtk.main()
74.         ctx.rectangle(event.area.x, event.area.y,
75.             event.area.width, event.area.height)
76.         ctx.clip()
77.         ctx.set_operator(cairo.OPERATOR_SOURCE)
78.         if alpha:
79.             ctx.set_source_rgba(1.0, 1.0, 1.0, a)
80.         else:
81.             ctx.set_source_rgb(1.0, 1.0, 1.0)
82.         ctx.paint()
83.        
84.         x0 = event.area.x
85.         y0 = event.area.y
86.         width = event.area.width

16 • Python Magazine • October 2007


F E AT U R E Creating custom PyGTK widgets with Cairo

(we calculate the orignating point for the rectangle, as forces the widget to redraw so that the level indicating
well as the dimensions based on the dimensions of the bar gets updated accordingly. This is done via the meth-
widget). Once the path is created, we set the source to od's invalidate_rect() and process_updates, which sends a
black with 50% transparency and call the fill() method. synthetic expose event to the widget.
Once the rectangle has been drawn, we follow a similar
pattern of calculating our coordinates based on the size
of the widget for drawing the icon for the speaker. Note And in the end, the
the use of relative coordinate versions of the drawing demonstration
methods while creating the speaker icon.
For the volume level indicator bar, we use a hack. We The demonstration (Listing 4) is fairly straightforward.
essentially draw a thick dashed line which looks like a The widget is displayed on screen, and its level prop-
bar. erty value is changed every second (via gobject.timeout_
add()) to some random value. On clicking, gtk.main_quit()
ctx.set_line_width(10) is called, which terminates the demo.
ctx.set_dash((10, 5), 0)
ctx.move_to(x0, y0)
ctx.line_to(length*self.level+x0, y0)
ctx.stroke() Final words and conclusion
This was a small demo (for the entire program's code, see
The first line sets the line width to 10 pixels. The second

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


Listing 5) of what is possible by combining PyCairo and
line sets the line to be dashed from the very beginning,
PyGTK together. Almost each week, cool new widgets
with each dash being 10 pixels wide, and the gap be-
with really fancy effects are being created by developers
tween two dashes being 5 pixels. The last three lines
all over the world, and I hope that my article will serve
actually draw the line (bar).
as the "initial push" for my readers in that direction.
Readers who want to know more can look at the PyGTK
The screen change event website (http://www.pygtk.org). Be sure to read the FAQ
for in depth information on just about every aspect of
handler the toolkit. Cairo is a comparatively newer technology,
but it is being adopted at a tremendous pace, and ex-
def do_screen_changed(self, old_screen=None): amples, tutorials, etc are available all over the Internet.
screen = self.get_screen()
colormap = screen.get_rgba_colormap() Most of the documents and tutorials are listed at the
if colormap: Cairo website (http://www.cairographics.org), and I would
self.supports_alpha = True
else:
encourage you to look at the Samples section of the
colormap = screen.get_rgb_colormap() website (http://cairographics.org/samples/) where a large
self.supports_alpha = False number of code snippets illustrate the various render-
self.set_colormap(colormap)
ings possible via Cairo.
This event handler is there for making sure that the wid-
get does not act strangely if the screen for it changes. Editor's note: The code for this article was tested under
It sets the value of the supports_alpha variable to False linux, but if you develop with gtk under Microsoft
if the screen does not support RGBA visuals and thus the Windows, the same concepts apply and most, if not all, of
expose event handler knows that it should not draw us- the code should be reusable.
ing alpha channels until the screen is changed again.

The _set_level() method


def _set_level(self, level):
self._level = level
if self.window:
alloc = self.get_allocation()
rect = gdk.Rectangle(alloc.x, alloc.y,
alloc.width, alloc.height)
Sayamindu Dasgupta is an engineering student from Kolkata,
self.window.invalidate_rect(rect, True) India. Apart from coordinating the bn_IN (Bengali, India)
self.window.process_updates(True) translations for GNOME, he is involved with the Sabayon
project and the Exaile Media Player project. When not
This method lets the developer set the value of the wid- around computers, he likes to play with his pet cat or
get's level property. After a new value is received, it also fiddle with his digital camera.

October 2007 • Python Magazine • 17


F E AT U R E

Working with IMAP


and iCalendar
by Doug Hellmann

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


What can you do to access group calendar information if your Exchange-like
mail and calendaring server does not provide iCalendar feeds, and you do not,
or cannot, use Outlook? Use Python to extract the calendar data and generate
your own feed, of course! This article discusses a surprisingly short program
to perform what seems like a complex operation: scan IMAP folders, extract
iCalendar attachments, and merge the contained events together into a single
calendar.

I
recently needed to access shared schedule informa-
tion stored on an Exchange-like mail and calendaring REQUIREMENTS
server. In this article, I will discuss how I combined
an existing third-party open source library with the tools PYTHON: 2.x
in the Python standard library to create a command line
program called mailbox2ics for converting the calendar
Other Software: Max M's icalendar library, from
data into a format I could bring into my desktop client
directly. The final product is just under 140 lines long, http://codespeak.net/icalendar/
including command line switch handling, some error
processing, and debug statements; far shorter than I had Useful/Related Links:
anticipated. The output file produced can be consumed • Source for this program
by any scheduling client which supports the iCalendar http://www.doughellmann.com/projects/mailbox2ics/
standard. • RFC 2445 - iCalendar specification
Using Exchange, or a compatible replacement, for email http://www.ietf.org/rfc/rfc2445.txt
and scheduling makes sense for many environments. The • IMAP specification
client program, Microsoft Outlook, is usually familiar to
http://www.ietf.org/rfc/rfc3501.txt
non-technical staff members, who are able to hit the
• Python standard library imaplib documentation
ground running instead of trying to figure out how to
http://docs.python.org/lib/module-imaplib.html
accomplish their basic communication tasks. However,
my laptop runs Mac OS X and I do not have Outlook.
Purchasing a copy of Outlook at my own expense, not to

18 • Python Magazine • October 2007


F E AT U R E Working with IMAP and iCalendar

mail_server = imaplib.IMAP4_SSL(hostname)
mention inflicting further software bloat on my already mail_server.login(username, password)
crowded computer, seemed like a suboptimal solution.
Changing the server software was also not an option. It is also possible to use IMAP over a non-standard port,
A majority of the users already had Outlook and were when necessary. In that case, the caller can pass port
accustomed to using it for their scheduling, and I did as an additional option to imaplib.IMAP4_SSL(). To work
not want to have to support a different server platform. with an IMAP server without the SSL encryption layer,
What I needed, then, was a way to pull the data out of you can use the IMAP4 class, but using SSL is definitely
the existing server so I could convert it to a format that preferred whenever possible.
I could use with my usual tools: Apple's iCal and Mail.
With iCal, as with many other standards-compliant mail_server = imaplib.IMAP4_SSL(hostname, port)
mail_server.login(username, password)
calendar tools, it is possible to subscribe to calendar
data feeds. Unfortunately, the server we were using did The connection to the IMAP server is "stateful". The cli-
not have the ability to export the schedule data in a ent remembers which methods have been called on it,
standard format using a single file or URL. However, the and changes its internal state to reflect those calls. The
server did provide access to the calendar data via IMAP internal state is used to detect logical errors in the se-
using shared public folders. I decided to write a Python quence of method calls without the round-trip to the
program to extract the data from the server and convert server.
it into a usable feed. The feed could then be passed to On an IMAP server, messages are organized into "mail-

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


iCal, which would merge the group schedule with the boxes". Each mailbox has a name and, since mailboxes
rest of my calendar information so I could see the group might be nested, the full name of the mailbox is the
events alongside my other meetings, deadlines, and re- path to that mailbox. Mailbox paths work just like paths
minders about when the recycling is picked up on our to directories or folders in a filesystem. The paths are
street. single strings, with levels usually separated by forward
slash (/) or period (.). The actual separator value used
IMAP Basics depends on the configuration of your IMAP server; one
of my servers uses a slash, while the other uses period.
The calendar data with which I was working was only ac- If you do not already know how your server is set up,
cessible as attachments on email messages on an IMAP you will need to experiment to determine the correct
server. The messages were grouped into several folders, folder names.
with each folder representing a separate public calendar Once I had my client connected to the server, the next
used for a different purpose (meeting room schedules, step was to call select() to set the mailbox context to be
event planning, holiday and vacation schedules, etc). I used when searching for and downloading messages.
had read-only access to all of the messages in the public
calendar folders. Each email message typically had one mail_server.select('Public Folders/EventCalendar')
# or
attachment describing a single event. To produce the mail_server.select('Public Folders.EventCalendar')
merged calendar, I needed to scan several folders, read
each message in the folder, find and parse the calen- After a mailbox is selected, it is possible to retrieve mes-
dar data in the attachments, and identify the calendar sages from the mailbox using search(). The IMAP method
events. Once I had identified the events to include in search() supports filtering to identify only the messages
the output, I needed to add them to an output file in a you need. You can search for messages based on the
format iCal understands. content of the message headers, with the rules evalu-
Python's standard library includes the imaplib module ated on the server instead of your client, thus reducing
for working with IMAP servers. The IMAP4 and IMAP4_ the amount of information the server has to transmit to
SSL classes provide a high level interface to all of the the client. Refer to RFC 3501 ("Internet Message Access
features I needed: connecting to the server securely, Protocol") for details about the types of queries which
accessing mailboxes, finding messages, and then down- can be performed and the syntax for passing the query
loading them. To experiment with retrieving data from arguments.
the IMAP server, I started by establishing a secure con- In order to implement mailbox2ics, I needed to look at
nection to the server on the standard port for IMAP over all of the messages in every mailbox for the user named
SSL, and logging in using my regular account. This would on the command line, so I simply used the filter "ALL"
not be a desirable way to run the final program on a with each mailbox. The return value from search() in-
regular basis, but it worked fine for development and cludes a response code and a string with the message
testing. numbers separated by spaces. A separate call is required
to retrieve more details about an individual message,

October 2007 • Python Magazine • 19


F E AT U R E Working with IMAP and iCalendar

such as the headers or body. iCalendar event notification is via an email attachment.
Most standard calendaring tools, such as iCal and Out-
(typ, [message_ids]) = mail_server.search(None,
'ALL')
look, generate these email messages when you initially
message_ids = message_ids.split() "invite" another participant to a meeting, or update an
existing meeting description. The iCalendar standard
Individual messages are retrieved via fetch(). If only says the file should have filename extension ICS and
part of the message is desired (size, envelope, body), mime-type text/calendar. The input data for mailbox2ics
that part can be fetched to limit bandwidth. I could not came from email attachments of this type.
predict which subset of the message body might include The iCalendar format is text-based. A simple example
the attachments I wanted, so it was simplest for me to
of an ICS file with a single event is provided in Listing 1.
download the entire message. Calling fetch("(RFC822)")
Calendar events have properties to indicate who was in-
returns a string containing the MIME-encoded version of
vited to an event, who originated it, where and when it
the message with all headers intact.
will be held, and all of the other expected bits of infor-
typ, message_parts = mail_server.fetch( mation important for a scheduled event. Each property
message_ids[0], '(RFC822)') of the event is encoded on its own line, with long values
message_body = message_parts[0][1]
wrapped onto multiple lines in a well-defined way to al-
Once the message body had been downloaded, the next low the original content to be reconstructed by a client
receiving the iCalendar representation of the data. Some

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


step was to parse it to find the attachments with cal-
endar data. Beginning with version 2.2.3, the Python properties also can be repeated, to handle cases such as
standard library has included the email package for work- meetings with multiple invitees.
ing with standards-compliant email messages. There is a In addition to having a variety of single or multi-value
straightforward factory for converting message text to properties, calendar elements can be nested, much like
Message objects. To parse the text representation of an email messages with attachments. An ICS file is made up
email and create a Message instance from it, use email. of a VCALENDAR component, which usually includes one
message_from_string(). or more VEVENT components. A VCALENDAR might also
include VTODO components (for tasks on a to-do list).
msg = email.message_from_string(message_body)
A VEVENT may contain a VALARM, which specifies the
Message objects are almost always made up of multiple time and means by which the user should be reminded
parts. The parts of the message are organized in a tree of the event. The complete description of the iCalendar
structure, with message attachments supporting nested format, including valid component types and property
attachments. Subparts or attachments can even include names, and the types of values which are legal for each
entire email messages, such as when you forward a mes- property, is available in the RFC.
sage which already contains an attachment to someone This sounds complex, but luckily, I did not have to
else. To iterate over all of the parts of the Message tree worry about parsing the ICS data at all. Instead of do-
recursively, use the walk() method. ing the work myself, I took advantage of an open source
Python library for working with iCalendar data released
for part in msg.walk(): by Max M. (maxm@mxm.dk). His iCalendar library (avail-
print part.get_content_type()
able from codespeak.net) makes parsing ICS data sources
Having access to the email package saved an enormous very simple. The API for the library was designed based
amount of time on this project. Parsing multi-part email
LISTING 1
messages reliably is tricky, even with (or perhaps be-
cause of) the many standards involved. With the email 1. BEGIN:VCALENDAR
2. CALSCALE:GREGORIAN
package, in just a few lines of Python, you can parse 3. PRODID:-//Big Calendar Corp//Server Version X.Y.Z//EN
4. VERSION:2.0
and traverse all of the parts of even the most complex 5. METHOD:PUBLISH
standard-compliant multi-part email message, giving 6. BEGIN:VEVENT
7. UID:20379258.1177945519186.JavaMail.root(a)imap.example.com
you access to the type and content of each part. 8. LAST-MODIFIED:20070519T000650Z
9. DTSTAMP:20070519T000650Z
10. DTSTART;VALUE=DATE:20070508

Accessing Calendar Data


11. DTEND;VALUE=DATE:20070509
12. PRIORITY:5
13. TRANSP:OPAQUE
14. SEQUENCE:0
The "Internet Calendaring and Scheduling Core Object 15. SUMMARY:Day off
16. LOCATION:
Specification", or iCalendar, is defined in RFC 2445. iCal- 17. CLASS:PUBLIC
endar is a data format for sharing scheduling and other 18. END:VEVENT
19. END:VCALENDAR
date-oriented information. One typical way to receive an

20 • Python Magazine • October 2007


F E AT U R E Working with IMAP and iCalendar

on the email package discussed previously, so working Once you have instantiated the Calendar object, there
with Calendar instances and email.Message instances is are two different ways to iterate through its components:
similar. Use the class method Calendar.from_string() to via the walk() method or subcomponents attribute. Using
parse the text representation of the calendar data to cre- walk() will traverse the entire tree and let you process
ate a Calendar instance populated with all of the proper- each component in the tree individually. Accessing the
ties and subcomponents described in the input data. subcomponents list directly lets you work with a larger
portion of the calendar data tree at one time. Proper-
from icalendar import Calendar, Event
cal_data = Calendar.from_string( ties of an individual component, such as the summary or
open('sample.ics', 'rb').read()) start date, are accessed via the __getitem__() API, just as
LISTING 2 LISTING 2: Continued...
1. #!/usr/bin/env python 70.  
2. # mailbox2ics.py 71.     # Connect to the mail server
3.   72.     if options.port is not None:
4. """Convert the contents of an imap mailbox to an ICS file. 73.         mail_server = imaplib.IMAP4_SSL(hostname, options.port)
5.  74.     else:
6. This program scans an IMAP mailbox, reads in any messages with ICS 75.         mail_server = imaplib.IMAP4_SSL(hostname)
7. files attached, and merges them into a single ICS file as output. 76.     (typ, [login_response]) = mail_server.login(username, password)
8. """ 77.     try:
9.   78.         # Process the mailboxes
10. # Import system modules 79.         for mailbox in mailboxes:

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


11. import imaplib 80.             if options.verbose: print >>sys.stderr, 'Scanning %s ...' % mailbox
12. import email 81.             (typ, [num_messages]) = mail_server.select(mailbox)
13. import getpass 82.             if typ == 'NO':
14. import optparse 83.                 raise RuntimeError('Could not find mailbox %s: %s' %
15. import sys 84.                                    (mailbox, num_messages))
16.   85.             num_messages = int(num_messages)
17. # Import Local modules 86.             if not num_messages:
18. from icalendar import Calendar, Event 87.                 if options.verbose: print >>sys.stderr, '  empty'
19.   88.                 continue
20. # Module 89.  
21.   90.             # Find all messages
22. def main(): 91.             (typ, [message_ids]) = mail_server.search(None, 'ALL')
23.     # Set up our options 92.             for num in message_ids.split():
24.     option_parser = optparse.OptionParser( 93.  
25.         usage='usage: %prog [options] hostname username mailbox [mailbox...]' 94.                 # Get a Message object
26.         ) 95.                 typ, message_parts = mail_server.fetch(num, '(RFC822)')
27.     option_parser.add_option('-p', '--password', dest='password', 96.                 msg = email.message_from_string(message_parts[0][1])
28.                              default='', 97.  
29.                              help='Password for username', 98.                 # Look for calendar attachments
30.                              ) 99.                 for part in msg.walk():
31.     option_parser.add_option('--port', dest='port', 100.                     if part.get_content_type() == 'text/calendar':
32.                              help='Port for IMAP server', 101.                         # Parse the calendar attachment
33.                              type="int", 102.                         ics_text = part.get_payload(decode=1)
34.                              ) 103.                         importing = Calendar.from_string(ics_text)
35.     option_parser.add_option('-v', '--verbose', 104.  
36.                              dest="verbose", 105.                         # Add events from the calendar to our merge calendar
37.                              action="store_true", 106.                         for event in importing.subcomponents:
38.                              default=True, 107.                             if event.name != 'VEVENT':
39.                              help='Show progress', 108.                                 continue
40.                              ) 109.                             if options.verbose:
41.     option_parser.add_option('-q', '--quiet', 110.                                 print >>sys.stderr, 'Found: %s' %
42.                              dest="verbose", event['SUMMARY']
43.                              action="store_false", 111.                             merged_calendar.add_component(event)
44.                              help='Do not show progress', 112.     finally:
45.                              ) 113.         # Disconnect from the IMAP server
46.     option_parser.add_option('-o', '--output', dest="output", 114.         if mail_server.state != 'AUTH':
47.                              help="Output file", 115.             mail_server.close()
48.                              default=None, 116.         mail_server.logout()
49.                              ) 117.  
50.   118.     # Dump the merged calendar to our output destination
51.     (options, args) = option_parser.parse_args() 119.     if options.output:
52.     if len(args) < 3: 120.         output = open(options.output, 'wt')
53.         option_parser.print_help() 121.         try:
54.         print >>sys.stderr, '\nERROR: Please specify a username, hostname, and 122.             output.write(str(merged_calendar))
mailbox.' 123.         finally:
55.         return 1 124.             output.close()
56.     hostname = args[0] 125.     else:
57.     username = args[1] 126.         print str(merged_calendar)
58.     mailboxes = args[2:] 127.     return 0
59.   128.  
60.     # Make sure we have the credentials to login to the IMAP server. 129. if __name__ == '__main__':
61.     password = options.password or getpass.getpass(stream=sys.stderr) 130.     try:
62.   131.         exit_code = main()
63.     # Initialize a calendar to hold the merged data 132.     except Exception, err:
64.     merged_calendar = Calendar() 133.         print >>sys.stderr, 'ERROR: %s' % str(err)
65.     merged_calendar.add('prodid', '-//mailbox2ics//doughellmann.com//') 134.         exit_code = 1
66.     merged_calendar.add('calscale', 'GREGORIAN') 135.     sys.exit(exit_code)
67.   136.  
68.     if options.verbose:  
69.         print >>sys.stderr, 'Logging in to "%s" as %s' % (hostname, username)

October 2007 • Python Magazine • 21


F E AT U R E Working with IMAP and iCalendar

with a standard Python dictionary. The property names the server via IMAP, parse each message looking for the
are not case sensitive. ICS attachments, parse them to produce another ICS
For example, to print the "SUMMARY" field values from file, and import that file into my calendar client. All
all top level events in a calendar, you would first iterate that remained was to tie the pieces together and give
over the subcomponents, then check the name attribute it a user interface. The source for the resulting program,
to determine the component type. If the type is VEVENT, mailbox2ics.py, is provided in Listing 2.
then the summary can be accessed and printed. Since I wanted to set up the export job to run on a
regular basis via cron, I chose a command line inter-
for event in cal_data.subcomponents:
if event.name == 'VEVENT':
face. The main() function for mailbox2ics.py starts out at
print 'EVENT:', event['SUMMARY'] line 24 with the usual sort of configuration for command
line option processing via the optparse module. Listing
While most of the ICS attachments in my input data 3 shows the help output produced when the program is
would be made up of one VCALENDAR componment with run with the -h option.
one VEVENT subcomponent, I did not want to require The –password option can be used to specify the IMAP
this limitation. The calendars are writable by anyone in account password on the command line, but if you
the organization, so while it was unlikely that anyone choose to use it consider the security implications of
would have added a VTODO or VJOURNAL to public data, I embedding a password in the command line for a cron
could not count on it. Checking for VEVENT as I scanned task or shell script. No matter how you specify the pass-

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


each component let me ignore components with types word, I recommend creating a separate mailbox2ics ac-
that I did not want to include in the output. count on the IMAP server and limiting the rights it has
Writing ICS data to a file is as simple as reading it, so no data can be created or deleted and only public
and only takes a few lines of code. The Calendar class folders can be accessed. If –password is not specified on
handles the difficult tasks of encoding and formatting the command line, the user is prompted for a password
the data as needed to produce a fully formatted ICS rep- when they run the program. While less useful with cron,
resentation, so I only needed to write the formatted text providing the password interactively can be a solution
to a file. if you are unable, or not allowed, to create a separate
ics_output = open('output.ics', 'wb') restricted account on the IMAP server. The account name
try: used to connect to the server is required on the com-
ics_output.write(str(cal_data))
finally:
mand line.
ics_output.close() There is also a separate option for writing the ICS out-
put data to a file. The default is to print the sequence
Finding Max M's iCalendar library saved me a lot of time of events to standard output in ICS format. Though it is
and effort, and demonstrates clearly the value of Py- easy enough to redirect standard output to a file, the -o
thon and open source in general. The API is concise and, option can be useful if you are using the -v option to en-
since it is patterned off of another library I was already able verbose progress tracking and debugging.
using, the idioms were familiar. I had not embarked on The program uses a separate Calendar instance,
this project eager to write parsers for the input data, so merged_data, to hold all of the ICS information to be
I was glad to have libraries available to do that part of included in the output. All of the VEVENT components
the work for me. from the input are copied to merged_data in memory,
and the entire calendar is written to the output location
Putting It All Together at the end of the program. After initialization (line 64),
merged_data is configured with some basic properties.
At this point, I had the pieces to build a program to do PRODID is required and specifies the name of the product
what I needed. I could read the email messages from which produced the ICS file. CALSCALE defines the date
system, or scale, used for the calendar.
After setting up merged_calendar, mailbox2ics con-
LISTING 3
nects to the IMAP server. It tests whether the user has
1. Usage: mailbox2ics.py [options] hostname username mailbox [mailbox...]
2.   specified a network port using –port and only passes a
3. Options:
4.   -h, --help            show this help message and exit
port number to imaplib if the user includes the option.
5.   -p PASSWORD, --password=PASSWORD The optparse library converts the option value to an in-
6.                         Password for username
7.   --port=PORT           Port for IMAP server teger based on the option configuration, so options.port
8.   -v, --verbose         Show progress
9.   -q, --quiet           Do not show progress
is either an integer or None.
10.   -o OUTPUT, --output=OUTPUT The names of all mailboxes to be scanned are passed
11.                         Output file
as arguments to mailbox2ics on the command line after

22 • Python Magazine • October 2007


F E AT U R E Working with IMAP and iCalendar

the rest of the option switches. Each mailbox name is of the ICS files only included one VEVENT anyway, but
processed one at a time, in the for loop starting on line I did not want to miss anything important if that ever
79. After calling select() to change the IMAP context, turned out not to be the case.
the message ids of all of the messages in the mailbox are
retrieved via a call to search(). The full content of each for event in importing.subcomponents:
if event.name != 'VEVENT':
message in the mailbox is fetched in turn, and parsed continue
with email.message_from_string(). Once the message has merged_calendar.add_component(event)

been parsed, the msg variable refers to an instance of


Once all of the mailboxes, messages, and calendars are
email.Message.
processed, the merged_calendar refers to a Calendar in-
Each message may have multiple parts containing dif-
stance containing all of the events discovered. The last
ferent MIME encodings of the same data, as well as any
step in the process, starting at line 119, is for mailbox2-
additional message information or attachments included
ics to create the output. The event data is formatted
in the email which generated the event. For event noti-
using merged_calendar.as_string(), just as in the example
fication messages, there is typically at least one human-
above, and written to the output destination selected by
readable representation of the event and frequently both
the user (standard output or file).
HTML and plain text are included. Of course, the mes-
sage also includes the actual ICS file, as well. For my
purposes, only the ICS attachments were important, but

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


" The parts of the message are organized in a tree
structure, with message attachments supporting
nested attachments."
there is no way to predict where they will appear in the
sequence of attachments on the email message. To find Example
the ICS attachments, mailbox2ics walks through all of Listing 4 includes sample output from running mailbox2-
the parts of the message recursively looking for attach- ics to merge two calendars for a couple of telecommut-
ments with mime-type text/calendar (as specified in the ing workers, Alice and Bob. Both Alice and Bob have
iCalendar standard) and ignoring everything else. At- placed their calendars online at imap.example.com. In
tachment names are ignored, since mime-type is a more the output of mailbox2ics, you can see that Alice has 2
reliable way to identify the calendar data accurately. events in her calendar indicating the days when she will
for part in msg.walk():
be in the office. Bob has one event for the day: a meet-
if part.get_content_type() == 'text/calendar': ing scheduled with Alice.
# Parse the calendar attachment The output file created by mailbox2ics containing the
ics_text = part.get_payload(decode=1)
importing = Calendar.from_string(ics_text) merged calendar data from Alice and Bob's calendars is
shown in Listing 5. You can see that it includes all 3
When it finds an ICS attachment, mailbox2ics parses events as VEVENT components nested inside a single
the text of the attachment to create a new Calendar in- VCALENDAR. There were no alarms or other types of com-
stance, then copies the VEVENT components from the ponents in the input data.
parsed Calendar to merged_calendar. The events do not
need to be sorted into any particular order when they
are added to merged_calendar, since the client reading Mailbox2ics In Production
the ICS file will filter and reorder them as necessary. It To solve my original problem of merging the events into
was important to take the entire event, including any a sharable calendar to which I could subscribe in iCal,
subcomponents, to ensure that all alarms are included. I scheduled mailbox2ics to run regularly via cron. With
Instead of traversing the entire calendar and accessing some experimentation, I found that running it every
each component individually, I simply iterated over the 10 minutes caught most of the updates quickly enough
subcomponents of the top-level VCALENDAR node. Most

October 2007 • Python Magazine • 23


F E AT U R E Working with IMAP and iCalendar

for my needs. The program runs locally on a web server are readily available through the __getitem__() API of the
which has access to the IMAP server. For better security, Calendar instance and it would be simple to compare
it connects to the IMAP server as a user with restricted them against the pattern(s).
permissions. The ICS output file produced is written to a If a large amount of data is involved, either spread
directory accessible to the web server software. This lets across several calendars or because there are a lot of
me serve the ICS file as static content on the web server events, it might also be useful to be able to update an
to multiple subscribers. Access to the file through the existing cached file, rather than building the whole ICS
web is protected by a password, to prevent unauthorized file from scratch each time. Looking only at unread mes-
access. sages in the folder, for example, would let mailbox2ics
skip downloading old events that are no longer relevant
or already appear in the local ICS file. It could then
Thoughts About Future initialize merged_calendar by reading from the local file
Enhancements before updating it with new events and rewriting the
file. Caching some of the results in this way would place
Mailbox2ics does everything I need it to do, for now. less load on the IMAP server, so the export could easily
There are a few obvious areas where it could be en-
hanced to make it more generally useful to other users
LISTING 4
with different needs, though. Input and output filtering

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


for events could be added. Incremental update support
1. $ mailbox2ics.py -o group_schedule.ics imap.example.com mailbox2ics  "Calendars.
would help it scale to manage larger calendars. Han- Alice" "Calendars.Bob"
dling non-event data in the calendar could also prove 2. Password:
3. Logging in to "imap.example.com" as mailbox2ics
useful. And using a configuration file to hold the IMAP 4. Scanning Calendars.Alice ...
5. Found: In the office to work with Bob on project proposal
password would be more secure than passing it on the 6. Found: In the office
command line. 7. Scanning Calendars.Bob ...
8. Found: In the office to work with Alice on project proposal
At the time of this writing, mailbox2ics does not offer
any way to filter the input or output data other than by
controlling which mailboxes are scanned. Adding finer- LISTING 5
grained filtering support could be useful. The input data
1. BEGIN:VCALENDAR
could be filtered at two different points, based on IMAP 2. CALSCALE:GREGORIAN
rules or the content of the calendar entries themselves. 3. PRODID:-//mailbox2ics//doughellmann.com//
4. BEGIN:VEVENT
IMAP filter rules (based on sender, recipient, subject 5. CLASS:PUBLIC
6. DTEND;VALUE=DATE:20070704
line, message contents, or other headers) would use the 7. DTSTAMP:20070705T180246Z
capabilities of IMAP4.search() and the IMAP server with- 8. DTSTART;VALUE=DATE:20070703
9. LAST-MODIFIED:20070705T180246Z
out much effort on my part. All that would be needed are 10. LOCATION:
11. PRIORITY:5
a few command line options to pass the filtering rules, 12. SEQUENCE:0
or code to read a configuration file. The only difference 13. SUMMARY:In the office to work with Bob on project proposal
14. TRANSP:TRANSPARENT
in the processing by mailbox2ics would be to convert the 15. UID:9628812.1182888943029.JavaMail.root(a)imap.example.com
16. END:VEVENT
input rules to the syntax understood by the IMAP server 17. BEGIN:VEVENT
and pass them to search(). 18. CLASS:PUBLIC
19. DTEND;VALUE=DATE:20070627
Filtering based on VEVENT properties would require a 20. DTSTAMP:20070625T154856Z
21. DTSTART;VALUE=DATE:20070626
little more work. The event data must be downloaded 22. LAST-MODIFIED:20070625T154856Z
and checked locally, since the IMAP server will not look 23. LOCATION:Atlanta
24. PRIORITY:5
inside the attachments to check the contents. Filtering 25. SEQUENCE:0
26. SUMMARY:In the office
using date ranges for the event start or stop date could 27. TRANSP:TRANSPARENT
be very useful, and not hard to implement. The Calendar 28. UID:11588018.1182542267385.JavaMail.root(a)imap.example.com
29. END:VEVENT
class already converts dates to datetime instances. The 30. BEGIN:VEVENT
31. CLASS:PUBLIC
datetime package makes it easy to test dates against 32. DTEND;VALUE=DATE:20070704
rules such as "events in the next 7 days" or "events since 33. DTSTAMP:20070705T180246Z
34. DTSTART;VALUE=DATE:20070703
Jan 1, 2007". 35. LAST-MODIFIED:20070705T180246Z
36. LOCATION:
Another simple addition would be pattern matching 37. PRIORITY:5
against other property values such as the event summa- 38. SEQUENCE:0
39. SUMMARY:In the office to work with Alice on project proposal
ry, organizer, location, or attendees. The patterns could 40. TRANSP:TRANSPARENT
41. UID:9628812.1182888943029.JavaMail.root(a)imap.example.com
be regular expressions, or a simpler syntax such as glob- 42. END:VEVENT
bing. The event properties, when present in the input, 43. END:VCALENDAR

24 • Python Magazine • October 2007


F E AT U R E Working with IMAP and iCalendar

be run more frequently than once every 10 minutes. ect illustrates two reasons why I enjoy developing with
In addition to filtering to reduce the information in- Python. First, difficult tasks are made easier through
cluded in the output, it might also prove useful to add the power of the "batteries included" nature of Python's
extra information by including component types other standard distribution. And second, coupling Python with
than VEVENT. For example, including VTODO would allow the wide array of other open source libraries available
users to include a group action list in the group calendar. lets you get the job done, even when you encounter
Most scheduling clients support filtering the to-do items those times when the Python standard library lacks the
and alarms out of calendars to which you subscribe, so exact tool you need. Using the ICS file produced by mail-
if the values are included in a feed, individual users can box2ics, I am now able to access the calendar data I
always ignore the ones they choose. need using my familiar tools, even though iCalendar is
As mentioned earlier, using the –password option to not supported directly by the group's calendar server.
provide the password to the IMAP server is convenient,
but not secure. For example, on some systems it is pos-
sible to see the arguments to programs using ps. This al-
lows any user on the system to watch for mailbox2ics to
run and observe the password used. A more secure way
to provide the password is through a configuration file.
The file can have filesystem permissions set so that only

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


the owner can access it. It could also, potentially, be
encrypted, though that might be overkill for this type of
program. It should not be necessary to run mailbox2ics
on a server where there is a high risk that the password Doug Hellmann is a Senior Software
file might be exposed. Engineer at Racemi. He has been
programming in Python since version
1.4 on a variety of Unix and non-Unix
Conclusion platforms. He has worked on projects
ranging from mapping to medical news
Mailbox2ics was a fun project that took me just a few publishing, with a little banking thrown in for good measure.
hours over a weekend to implement and test. This proj-

And now for something completely different


The first monthly magazine dedicated exclusively to Python

- Extending Python
- Working with IMAP and iCalendar
- Processing Web Forms Using Anonymous Functions & WSGI
- Creating custom PyGTK widgets with Cairo

For more info go to: http://www.pythonmagazine.com

October 2007 • Python Magazine • 25


F E AT U R E

Processing Web Forms Using


Anonymous Functions & WSGI
by Kevin T. Ryan

If you're a web developer, you're well aware of the importance of forms in web development.

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


Not only are they a valuable tool in gathering information from your users, but they can
also be used for thousands of other purposes (e.g., running a survey to see what your users
think of your site). This article will demonstrate how to use anonymous functions (commonly
known as "lambda" functions) to assist in the creation of SQL statements based on the
values received from web forms. We will demonstrate this in the context of a WSGI compliant
framework or component. Though WSGI by now has become well known throughout the
Python community, there still seems to be a cloud of mystery over parts of the spec. We'll
discuss some of the details of the spec that relate to processing form submissions in the
hopes of providing a better understanding of how WSGI fits into the bigger picture.

WSGI - One of Python's Greatest


Strengths REQUIREMENTS
Maybe you already have an idea of what WSGI is, but
PYTHON: 2.x
what exactly does it have to do with processing forms?
Everything. WSGI allows us to create "middleware" fairly
easily to assist with anything from url mapping to au- Useful/Related Links:
thentication to form processing to <insert idea here>. WSGI specification
That's what WSGI is all about, and that's why it should http://www.python.org/dev/peps/pep-0333/
be an important part of your repertoire. What do I mean
by that statement? Let me give you an example:
Suppose you're building a web application that col-

26 • Python Magazine • October 2007


F E AT U R E Processing Web Forms Using Anonymous Functions & WSGI

lects information from users – but only from registered framework is WSGI compliant, those func-
users. And let's further assume that you want to be able tions will be accepting the same arguments
to maintain internal state across HTTP requests (e.g., as before, and the webpage from step 2 can
so people don't have to keep on logging in to use the pass this information along and let the "Forms
site). To meet these fundamental needs, you will prob- Middleware" do it's thing.
ably need the following:
Moving forward, we will begin to build this form middle-
• something to map url's to internal functions ware we just spoke about in step 3 above, and we'll finish
• an authentication mechanism up by talking about using the information provided by
• something to do form processing the middleware with anonymous functions to build SQL
• and probably a lot more! statements that can be used in your web application.

If you are using components that are WSGI compliant,


then you're in great shape! Why? Well, because WSGI Building Middleware - Forms
can be called by subscribing to the following interface: So, now that we have a good understanding of what
"function_name(<environ>, <start_response>)" we're trying to accomplish and why, let's get on with it.
To begin, we know that, to be WSGI compliant, we have
If you'd like more background on WSGI, check out to accept 2 arguments: an environment variable, and a

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


http://www.python.org/dev/peps/pep-0333/ which contains start_response variable. The only one we'll need to be
the PEP describing WSGI. concerned with at this point is the former. The latter is
With that bit of information in hand, it follows that used when you are all finished and ready to complete the
you can integrate the various pieces of your components request of the user, which we are not ready to do at this
by building them one on top of another: point. Remember, we are building a middleware compo-
nent here and some other function will have to complete
• The url mapper will accept <environ>, the request later on.
<start_response> and figure out what function To begin, we are going to build our middleware out of
to call based on the environment it is given as some simple components:
its first argument (more on that later). So the
url mapper calls the function you have desig- • data types
nated as accepting requests for this url. • fields
• Since you want only authenticated users to be • forms
able to access this page, you can simply get a
WSGI Authentication Middleware agent (or cre- Each piece will be built on top of (or out of) the previ-
ate one yourself - more on that later as well) ous pieces. For example, fields are built with the help
and authenticate. If the user passes the test, of data types, and forms are built out of fields. So let's
continue on. Otherwise, send them to a log-in start with the simplest component: data types. Since we
screen. are assuming that your site is SQL based, this part of the
• Now that you've authenticated your user, puzzle should be fairly straightforward. Essentially, we
you can continue to process the form they've want each data type to do two things:
submitted. Again, if this particular part of the
• Validate values provided against values allowed.
• Provide a helpful error message if the value
LISTING 1
provided was no good.
1.  
2. class Integer(DataType):
3.     def validates(self, value): Pretty easy, eh? Let's define a base abstract class (you
4.         try:
5.             int(value) can place it in Forms.py, but see below for the actual
6.             return value code):
7.         except ValueError:
8.             raise ValueError("Must be whole number (eg, 100)")
9.   class DataType(object):
10. class Varchar(DataType): def __init__(self):
11.     def __init__(self, length=255):
pass
12.         self.length = length
13.     def validates(self, value): def validates(self, value):
14.         if len(value) <= self.length: raise NotImplementedError, "Must subclass
15.             return value DataType"
16.         else:
17.             raise ValueError("Must be no longer than %d characters" % self.length)
Why define a base abstract class? Well, it helps ensure

October 2007 • Python Magazine • 27


F E AT U R E Processing Web Forms Using Anonymous Functions & WSGI

that all of our data types comply to the standard inter- ron" variable that I've been referring to throughout this
face (e.g., if a subclass tried to ignore the "validates" article. But first, you can define the class as shown in
method, our users wouldn't be able to ensure a value is Listing 3.
valid). To see some standard data types that will be use- And that's basically it! The class boils down to:
ful going forward, see Listing 1.
These are pretty simple, but you get the idea. You • Create the form with a list of fields, which we
may even want to provide better error checking - for assume (although we do not check for it above
example, the Integer type will allow you to pass floats explicitly) are instances of our Field object.
without complaining, but you might not want that. It • Populate the form with the values provided in
all depends on what you'll be using the data for, but I'll the form.
leave it up to you to define more types and perhaps bet- • Validate the values given.
ter error checking.
Essentially, what these two examples do is provide a Step 2 probably requires a bit more discussion, so we'll
service: they make sure that values passed to the form do that in the next section. However, but for that one
comply with certain data standards. Also, note that their exception, the class should be pretty easy to follow. We
'validates' method will raise an exception if there is an ensure that we are WSGI compliant with our 'validates'
error in processing the form data. That is, the value is method by accepting both 'environ' and 'start_response'
untouched and we know the field validates or we get arguments and returning an iterable of strings (the er-

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


some kind of error that we can pass back up the chain rors dictionary). The validates method will check that ap-
on invalid data which we can't handle. This will become propriate values were provided throughout the form. We
useful when we develop our fields and forms. The next
step is to use the data types in a field. To see how that's
defined, see Listing 2. LISTING 2
This simple (but very functional) class allows us to 1.  
define a new field that can be used in a form. Once the 2. class Field(object):
3.     def __init__(self, name, datatype, required=False):
form information is sent back to us, this class will do a 4.         self.name = name
lot of the legwork in validating that the information giv- 5.         self.datatype = datatype
6.         self.required = required
en to us is good. It will perform the following checks: 7.     def validates(self, value_provided):
8.         if self.required and not value_provided:
9.             raise ValueError("Can not be left blank")
• If a field was required, was there a value 10.         elif value_provided:
11.             return self.datatype.validates(value_provided)
provided? 12.         else:
13.             return value_provided
• If a value was provided, does it comply with 14.
the expected type?
LISTING 3
Note that checking for required fields is as simple as
checking 'if self.required and not value_provided'. This 1. # Forms.py
2.  
is because Python is very flexible and adept at under- 3. import cgi
4.  
standing True/False values in this type of environment. 5. class Form(object):
If a blank string was sent (''), 'not value_provided' will 6.     def __init__(self, fields):
7.         self._fields = fields
return True, thus kicking off an exception about the 8.         self._values = None # to be provided later
9.    
lack of an entry. The error message can be later used 10.     def validates(self, environ, start_response=None):
in returning feedback to the web user or can be used 11.         self._values = cgi.FieldStorage(fp=environ['wsgi.input'], environ=environ)
12.         errors = {}
for internal purposes. Now you can begin to see why 13.         for field in self._fields:
14.             value = self.getvalue(field.name)
it was convenient to develop our datatype interface for 15.             try:
the 'validates' method in a similar fashion. To verify the 16.                 field.validates(value)
17.             except Exception, e:
appropriateness of the value (after we've checked that it 18.                 errors[field.name] = e.args[0]
19.         return errors
has been provided), all we have to do is run the validates 20.    
method of the datatype and return that as both methods 21.     # The following 2 functions will be helpful later
22.    
subscribe to the same interface. 23.     def fields(self):
24.         return self._fields
Now we are beginning to see things come together. 25.    
The final step in this puzzle is to develop a 'Form' object 26.     def getvalue(self, fieldname, default=None):
27.         try:
that can link the fields together, validate the form as 28.             return self._values.getvalue(fieldname, default)
29.         except AttributeError:
a whole and maybe provide some nice helpers as well. 30.             raise TypeError("Must populate the form before you can get values")
Also, this is where we'll look into the "mysterious envi- 31. 

28 • Python Magazine • October 2007


F E AT U R E Processing Web Forms Using Anonymous Functions & WSGI

rely heavily on our 'Field' class as well as our 'DataType' discussed the framework, let's move on to discuss in a
class (although the latter is not self-evident from above, little more detail the environ variable we so cleverly used
we know it to be the case). Note that we use a trick to with Python's built in cgi module.
return our values to the user: we rely on the fact that
the caller of our function can test for errors simply by
determining if they received the empty dictionary. WSGI - The Environ Variable &
CGI
Above, we created a form class that we'll be using later
to process data received from a web user. Within that
"You may not think that class we used Python's cgi module to give our Form class
legs in terms of getting at the data the user sent to us
via the form. So what exactly is in that 'environ' vari-
being WSGI compliant able that is sent as part of every WSGI call? Well, as Ben
Bangert (http://groovie.org/) so aptly put it:

above is terribly "environ is merely a dict that's loaded with some


common CGI environment variables"

important, but what if

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


So it's as simple as thinking of it as a dictionary with
some predefined and available keys. To find out precisely
you are writing a giant which keys must be available (if it is truly WSGI compli-
ant) see: http://www.python.org/peps/pep-0333.html#environ-
variables. You'll note that as part of that list of required
framework or website keys, there are keys specific to WSGI that must also be
present. Of note is the 'wsgi.input' key, which should

instead of just looking at


contain:

"An input stream (file-like object) from which the HTTP

this one example?" request body can be read. (The server or gateway may per-
form reads on-demand as requested by the application, or
it may pre- read the client's request body and buffer it in-
memory or on disk, or use any other technique for provid-
ing such an input stream, according to its preference.)"
So, the interface is simple: we return a mapping of the (Taken directly from the WSGI PEP)
field names to any errors encountered from that field.
If no errors are encountered, an empty dict is returned. We also know from reading the cgi module's source that
Furthermore, each error encountered contains vital in- the FieldStorage class can be instantiated with a file
formation to the caller: which field contained an error pointer (fp, which defaults to sys.stdin) as well as an
(ie, the key of the dictionary), and a descriptive message environment (environ, which defaults to os.environ).
(provided by the underlying classes) telling the end user Since the environment we're given as part of the WSGI
what the problem with the field was.
You may not think that being WSGI compliant above LISTING 4
is terribly important, but what if you are writing a giant 1.  
framework or website instead of just looking at this one 2. import re
3.  
example? Knowing that each and every component you 4. class Email(Varchar):
5.     email_pattern = re.compile('^([a-zA-Z0-9_.\-+])+@(([a-zA-Z0-9-])+.)+([a-zA-
deal with complies with the same interface, enabling you Z0-9]{2,4})+$')
to "just use the component for the purpose it serves", 6.  
7.     def validates(self, value):
is compelling. Also, since we've made the 2nd parameter 8.         value = super(Email, self).validates(value)
9.        
optional, anyone who knows about our interface can just 10.         # Further error checking specific to emails
call the function with the first argument and leave the 11.        
12.         if self.email_pattern.match(value):
2nd blank. Those who would like to call it blindly with- 13.             return value
14.         else:
out knowing the interface specifically can call it with the 15.             raise ValueError("Must be a valid email (eg, 'john_doe@myhost.com')")
default WSGI arguments and all is well. Now that we've 16.  

October 2007 • Python Magazine • 29


F E AT U R E Processing Web Forms Using Anonymous Functions & WSGI

protocol contains such a file pointer and an environment if there are errors, errors will evaluate truthfully (i.e.,
variable, we pass them explicitly to the FieldStorage call. things are *not* okay) and will map the problematic
The cgi module takes over from there, and graciously field names to their error messages. For example, if a
provides us with a dictionary-like object that contains user were to provide an email similar to the following:
all the values sent by the user via the form!
bad@hostcom

How To Use the Form Class and all other fields (first and last name in our example)
were fine, the resulting error dictionary would look like:
& Anonymous Functions to
Process Data
{'email': "Must be a valid email (eg, 'john_doe@
myhost.com')"}

We've now come to the point in this article where we You could then use this to regenerate the form letting
have a WSGI component that can process and validate them know that the email field contained a bad value
forms in a WSGI-compliant way. Note that if you wanted and they need to fix it. Helpful error messages can go a
to, you could just as easily use another WSGI component long way in making things go as smoothly as possible.
that acted as middleware to process form submissions - But we are still left to our own devices to generate the
again, that's the beauty of WSGI! But we'll use our own SQL used to take the data from the user and put it into
classes here because they are simple to use, easy to ex- the database. This is where we will begin to use anony-

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


tend and well within the context of this article.
So, how do we use the middleware? Easy: let's assume
that you are using a url mapper (as discussed above) LISTING 5
such as Selector (http:lukearno.com/projects/selector/) 1.  
2. import Forms
and that you've mapped http:localhost:8080/my_form to 3. # Here is the function that will process every request coming to '/my_form'
a function in your application called 'process_my_form'. 4. # url:
5. def process_my_form(environ, start_response):
Let's further assume that on the 'my_form' HTML page, 6.     # You may want to make this a global variable so that it is computed only
7.     # once, instead of every time the function is called to respond to a url
you are gathering information from your users (e.g., first 8.     # request from the user.
name, last name and email so that you can send per- 9.     f = Forms.Form([
10.         Forms.Field('first_name', Forms.Varchar(50), required=True),
sonalized email to everyone who visits your site). So 11.         Forms.Field('last_name', Forms.Varchar(75), required=True),
12.         Forms.Field('email', Email(), required=True)
the form portion of the HTML page might look like the 13.     ])
following: 14.     errors = f.validates(environ)
15.     if errors:
16.         # We'll assume you have defined a function that will show the form
<form method="post" action="/my_form"> 17.         # with errors (and maybe re-populate the form with values the user has
<input type="text" name="first_name" /> 18.         # already provided) in another function
19.         show_form_with_errors(errors)
<input type="text" name="last_name" /> 20.     else:
<input type="text" name="email" /> 21.         process_updates(f)
<input type="submit" /> 22.         # We'll also assume you have built a function to tell the user you
</form> 23.         # have succeeded in gathering the information.
24.         show_successful_submission_form() # Success!
25.   
Simple enough. Now, within the module that contains
the 'process_my_form' function definition, you may want
LISTING 6
better error checking for email submissions than what is
provided by the 'Varchar' datatype class we've defined 1.  
2. def process_updates(form):
above. Let's go ahead and extend the Varchar as shown 3.     curs = CONNECTION.cursor()
4.     sql_map = {
in Listing 4. 5.         'first_name' : lambda value: ('first_name', value),
You can see how easy it would be to do the same for 6.         'last_name' : lambda value: ('last_name', value),
7.         'email' : lambda value: ('email', value),
phone numbers, unique fields, etc. Going into the ex- 8.     }
9.     sql = "INSERT INTO user_table (%s) VALUES (%s)"
planation behind the regular expression above would be 10.     columns = []
outside the scope of this article, but I can refer you to 11.     values = []
12.     for field in form.fields():
http://www.dustindiaz.com/update-your-email-regexp (which 13.         # Check if the value was provided by the user and add it to our lists
14.         # if it was
is where I think I grabbed it from in the 1st place). Now 15.         if form.get_value(field.name):
we have a data type that extends our original specifica- 16.             name, val = sql_map[field.name]()
17.             columns.append(name)
tion to check for valid emails. We'll use that to develop 18.             values.append(val)
19.     sql = sql % (", ".join(columns), ", ".join("%s" * len(values)))
our Form instance as shown in Listing 5. 20.     # sql now equals "INSERT INTO user_table (first_name, last_name, email) VALUES
It is clear that defining our framework made things a (%s, %s, %s)"
21.     curs.execute(sql, values)
lot easier (although we haven't yet built the 'process_ 22.     CONNECTION.commit()
23.
updates' function yet - but we will). You'll notice that

30 • Python Magazine • October 2007


F E AT U R E Processing Web Forms Using Anonymous Functions & WSGI

mous functions (or "lambdas") to help us again. It might who operate in Pennsylvania but they don't care about
be overkill for the current example, but we'll move onto anything else). Your standard search SQL might look like
something more substantial once you've seen the tech- the following (adapted, yet slightly modified from my
nique in action. work on the portss.com website):
Lambda functions are Python's way of representing
anonymous functions (of single expressions, at least). sql = '''
Select Distinct
We also know that there is a standard protocol for enter- name, service, ... etc.
ing values into our database (at least, there should be From
contractors
a standard protocol for entering values into your data- Where
base). So one easy way to enter the information into the %s
'''
database is shown in Listing 6.
Now, that's quite a mouthful! Essentially, what we Pretty straightforward. But now you get into each of the
tried to do was link everything together so that the additional filters that need to be applied, depending on
only thing we would need to change if our table were to the query sent by the end user. So we might set up a
change is the 'sql_map' dictionary. In that dictionary, we filter map as follows:
stored a list of columns and where we would like them
to go in the insert statement. So, if we decided to later filter_map = {
add 'phone number' to our database table, all we would 'name' : lambda val : ('company LIKE %s',

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


'%' + val + '%s'),
have to do is add: 'city' : lambda val : ('city LIKE %s',
'%' + val + '%'),
'phone_number' : lambda value : ('email', value), 'state' : lambda val : ('state = %s', val),
}
to the end of our dictionary, and our code is udpated au-
tomatically! As I mentioned before, this might be over- The 'LIKE' for name and city are an easy way of search-
kill for this example because it is somewhat trivial. But ing for values across the entire database for cases where
to see how it might work in a larger example, consider the user may only know part of the name (e.g., they
searching through your records to find something based might know only the last name of the person they're
on input provided by the end user. Let's say that you looking for). The one current challenge is that our search
were creating a search form that the users could use to is case-sensitive, but we'll come up with a way to deal
search for ratings left by other users (for an example, see with that in a minute. So how do we link these two items
http://www.portss.com/searchform). You have several search together? Just like we did before - see Listing 7 for how
fields that each might or might not be provided by the to do it.
end user (e.g., they may want to search for contractors Now you can begin to see how this might be more ex-
tensible than putting in a ton of 'if this_value exists, add
this clause to the sql statement' and figuring out where
LISTING 7 to put the 'AND'. For example, if you checked everything
1.  
manually you would need to figure out if something had
2. def search_for_contractor(environ, start_response): been added, then add the 'and' onto the front and con-
3.     curs = CONNECTION.cursor()
4.     form = Forms.Form([ tinue with the other filter, otherwise just show the filter.
5.    
6.    
    Forms.Field('name', Forms.Varchar(50)),
    Forms.Field('city', Forms.Varchar(75)),
Furthermore, if you add additional filters to the HTML
7.         Forms.Field('state', Forms.Varchar(20)) form, the only thing you need to change (again!) is the
8.     ])
9.     errors = form.validates(environ) filter_map. It turns out to be very convenient to do this
10.    
11.    
if errors:
    # tell user what to do when building these kinds of applications. You can also
12.         return show_form_with_errors(errors) see why we return the string and the 'val' as part of each
13.     filters = []
14.     values = [] of the lambda functions shown above - it makes it easy
15.    
16.    
for filterable_field in filter_map:
    val = form.get_value(filterable_field) to transform the value before putting it into our SQL.
17.         if val: In the example above, we were able to transform the
18.             what_to_do, val = filter_map[filterable_field]()
19.             filters.append(what_to_do) string to utilize SQL's "LIKE" statement. But we still had
20.             values.append(val)
21.     # Now that we've processed all the values, let's build the SQL statement the problem of case-sensitive searches. What if instead
22.    
23.    
sql = sql % (" And ".join(filters))
# I'll also assume you know how to set up a cursor to execute the stmt
you decided to store all names in your database (e.g.,
24.     curs.execute(sql, values) first names, last names, etc.) in lowercase? Maybe you
25.     results = curs.fetchall()
26.     start_response("200 OK",[('Content-Type', "text/html")]) decided this after your search didn't seem to be work-
27.    
28.    
# Here we would return an iterable (e.g., a Cheetah Template) with the values
# filled in from the "results" variable.
ing for certain cases (eg, the name 'Kevin' was in the
29.     return render_template(name='search_for_contractor', results=results) database, but the user was searching for 'kevin'). Well,
30.  
in that case you can update your database changing all

October 2007 • Python Magazine • 31


F E AT U R E Processing Web Forms Using Anonymous Functions & WSGI

values to lower case. For example, in PostgreSQL this can


be accomplished as follows:

Update my_table Set


first_name = lower(first_name),
last_name = lower(last_name),
... etc.

Then you could change your lambdas to the following:

'first_name' : lambda value : ('first_name = %s',


value.lower()),

And everything would be set! Now the query will return


correct results even when the user types something in
all lowercase (or uppercase, etc.) as all values provided
by the user will be converted in your anonymous func-
tion and the query will update itself. Again, the only
thing you would have to change is the filter_map and
you'd be good to go. You still might have to figure out
how to present the information, but that's the subject of

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


another discussion.

Conclusion
In concluding, let's briefly recap what we've seen:

• abstracting out the form functionality is useful


in increasing code reuse; And now for something
• doing so in a WSGI compliant fashion is easy
and smart, because it is now interchangeable completely different
with other WSGI compliant form components
without changing any of your code;
• using anonymous functions to process forms is The first monthly magazine dedicated
easy and makes your code very maintainable. exclusively to Python.
It also helps keep the logic all in one place, so
it's very easy to update as well.
- Extending Python
I hope this article gave you a flavor of what it means to - Working with IMAP and iCalendar
be WSGI compliant Middleware and that it will help you
in developing future websites. Good luck! - Processing Web Forms Using Anonymous
Functions & WSGI
- Creating custom PyGTK widgets with
Cairo

SUBSCRIBE TODAY!
Kevin T. Ryan is a CPA by day, programmer by night. For more info go to:
He has successfully been able to integrate his passion http://www.pythonmagazine.com
(programming) into his work (accounting) by using data
mining as the bridge. He has also created a website to
help people find contractors (e.g., plumbers, electricians,
etc.) at http://www.portss.com.

32 • Python Magazine • October 2007


F E AT U R E

Extending Python
Using C to Make Python Smarter

by John Berninger

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


So you need to do something in Python, but all you have available is a C library
API to deal with the actual data? Not to worry - Python can easily be extended
to work with that API. Just goes to show you, sometimes you can teach an old
dog new tricks!

S
o, the stock distribution of Python isn't good enough ment after the base OS install does take some work, but
for you, hmm? Well, that's not too surprising - it I'll show you the commands I used to get there. Once
wasn't good enough for me, either. Naturally, I de- the default OS is installed, I added the "rpmdevtools"
cided to Do Something about it - I taught Python a few package with the command "yum install rpmdevtools",
new tricks by writing a new module specifically for what which I use for Fedora packaging. This package required
I wanted to do - and it did indeed make my life easier! the 'fakeroot' package be installed for dependencies, and
I'm going to create a new module that duplicates also required updates on the following packages (again
functionality already available in Python modules as an for dependencies):
example, so please forgive the seeming duplication of
effort. It's easier to make sense of things using "Hello, • elfutils
World!" examples. • elfutils-libelf

Basic Module Requirements and


REQUIREMENTS
Setting up the Environment
First, some basic requirements for writing any new Python PYTHON: 2.5
module: First, you need a compiler available, and Python
development headers that you can compile against. In Other Software:
the Fedora Core Linux world, which is the world I live • gcc, make, and standard build environment tools
in, that consists of standard development tools like GCC,
and make, and the python-devel package. • Python header files for the version you're
Specifically, the environment I've set up is using a de- building against
fault installation of Fedora 7. Getting the right environ-

October 2007 • Python Magazine • 33


F E AT U R E Extending Python: Using C to Make Python Smarter

• elfutils-libelf-devel Starting Development


• elfutils-libs
• popt Once you have all those prerequisites installed, you can
• rpm start developing your module using C code. You'll need
• rpm-build to pull in the Python.h header file to get the Python
• rpm-devel module definitions, like so:
• rpm-libs
#include <Python.h>
• rpm-python
One of the first things you'll want to do then is to declare
a static pointer to a Python error object. As everyone
reading this knows, programmers make mistakes. They
call functions with incorrect parameters, or the wrong
"Passing parameters to number of parameters, or whatever. We need a nice way
to tell them they made a mistake, and that's done with

functions in a Python script the error object:


static PyObject *ErrorObject;

is done in the same way as This will ensure that this error/exception object is

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


unique to your module; although not strictly necessary,

in a C program..." it is considered impolite to pollute someone else's error


space.

Aside: Why not use a binding


Now that these packages are installed, I removed a se-
ries of superfluous -devel packages. This was mostly generator?
to ensure a clean RPM build environment, and is not There are many programs out there that are designed to
directly related to Python extension development, so it's take a library interface in one language and create bind-
not absolutely required. I suggest you remove as many ings for it in another language. Python is no exception,
-devel packages as you feel comfortable removing, how- so you might be wondering why we're going about doing
ever. When all was said and done, I had only the follow- this "the hard way" by writing all of our binding code
ing -devel packages left: ourselves, and not letting a generator do all the heavy
lifting.
• libstdc++-devel-4.1.2-12 To be perfectly honest, for many purposes, a generator
• python-devel-2.5-12.fc7 will work just fine. Most of them are designed to give
• glibc-devel-2.6-3 you as close to an identical interface in the target lan-
• perl-devel-5.8.8-18.fc7 guage as was found in the source language, and most of
them do exactly that with a minimum of fuss and bother.
The libstdc++ and glibc devel packages are required by Just because you can use a generator and give yourself
gcc, so we can't remove those. The python-devel pack- Python bindings, however, doesn't always mean you'll
age is the one we're interested in, so removing it would understand what goes into the bindings.
defeat our purpose here. We could probably remove the I'm a firm believer in understanding what's happening
perl devel package and it's dependencies, but I chose instead of relying on other people who try to tell me
not to simply because I tend to leave perl alone - the "Don't think about this, just let me work my magic and
OS is too dependent on perl and python for me to be you'll be fine." Well, I'm like that with computer things
completely comfortable removing packages that I'm not anyway - my car's engine is a black box to me, and will
sure of. likely stay that way until the heat death of the universe.
The last part of the process will be installing libraries But with computers, and programming in particular, I
or -devel packages required for the extensions you will want to know what's inside that black box.
be writing. Since our examples here will be very sim- I also think it's important to understand how to write
plistic (ie, they won't be making any library calls), we your own bindings in case the generator either doesn't
won't need any additional -devel packages installed (or work the way you expect it to, or you need to make
reinstalled, as the case may be). your bindings do something the generator doesn't un-
derstand. In either of those cases, you'll need to be able

34 • Python Magazine • October 2007


F E AT U R E Extending Python: Using C to Make Python Smarter

to get down and dirty in the C code itself and figure out mediately above the function it's describing, but this is
what makes the module tick, and what to turn sideways just convention - the string defintion can be anywhere
to get it to tock, as well as tick. in relation to the function definition except inside the
So, are generators or translators useful? Absoluely. function itself.
Are they applicable to your needs? Probably, but they
might not always be - and it's those few times when
they're not applicable that you're going to desperately
For Our Next Trick, A Function
need something that will let you finish the project by That Does Work
noon tomorrow for a presentation to the Board of Direc-
So now that we understand how to define functions, let's
tors just before the big company-wide rollout announce-
write one that actually does something, then examine it
ment. So by all means, find the generators and transla-
in detail. See listing 1 for the fully functional code.
tors and what-nots. But please, first understand what
The first thing we see is the documentation string.
they're doing to you and for you.
This tells us what the function will be doing: determin-
ing if a given number is even or odd. Yes, this is a really
Making A Python-callable trivial funtion that's already available in Python - I did
this deliberately so I could teach concepts and not have
function to worry about teaching behavior.

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


All Python callable functions are declared static, and Next, we see the function definition exactly as above,
should return either void or a pointer to a PyObject. save for the function name. After that we have declara-
They take two parameters, both pointers to a PyObject, tions for a couple of variables we'll use inside the func-
with the first being a reference to the module routine tion. Pretty generic C code so far. The next line is where
itself and the second being the argument list. Condens- things get interesting. Since we're being called from Py-
ing all of that down to the actual function declaration, thon, our parameter list is a pointer to a Python object.
we get this: We need an integer to work with, so we need to do some
translation work - this is what the PyArg_ParseTuple rou-
static PyObject *myfunc(PyObject *self, tine does for us. The three parameters to it are, in order,
PyObject *args)
the Python object that we are parsing, the expected for-
mat of the argument (the single 'i' representing a single
You should always name your routines something de-
integer in this case), and the location of the variable we
scriptive - this is just as true in C as in Python (or in
want the parsed object stored in.
any other language). The "myfunc" name above should
If parsing is successful, the function returns a 0; if
generally not be used unless (like here) you're just giv-
unsuccessful, a non-zero value. We use the if to trap a
ing examples. Always pick names that make sense for
non-zero return code, allowing us to handle the argu-
the module you're writing and for what the given func-
ment-parsing error gracefully. In this case, we set an
tion is doing.
error string in the default error object (declared outside
Now for a quick diversion into 'self-documenting'
the function) with the specific type of error (PyExc_Val-
code. We all know there's no such thing as true self-doc-
ueError, which translates into a Python ValueError ex-
umenting code, but there is a really easy way to make
documentation strings available to Python interpreters
for your new module. For any given routine, creating
LISTING 1
a documentation string is as easy as declaring another
1. static char isEven__doc__[] = "Determines if a number is even - if so, returns '1'
variable with a specially-formatted name, like so: (for TRUE), else returns 0 (for FALSE)\n";
2.  
3. static PyObject *isEven(PyObject *self, PyObject *args) {
static char myfunc__doc__[] = "This is the 4.   int inputValue;
documentation string for the 5.   int returnValue;
Python function myfunc()\n"; 6.  
7.   if (!PyArg_ParseTuple(args, "i", &inputValue)) {
8.     PyErr_SetString(PyExc_ValueError, "Argument parsing error");
So not only do you have information available for the 9.     return NULL;
end user making use of your module inside of a Python 10.   }
11.  
interpreter, you have some documentation to remind 12.   if ( 0 == inputValue%2 ) {
13.     returnValue=1;
you of just what the heck you were smoking when you 14.   } else {
initially wrote this function. This is especially helpful 15.     returnValue=0;
16.   }
when you have to go back and rewrite a module 2 years 17.  
18.   return Py_BuildValue("i", returnValue);
after forgetting all about it. 19. }
20.
Most modules will place the documentation string im-

October 2007 • Python Magazine • 35


F E AT U R E Extending Python: Using C to Make Python Smarter

ception), and a descriptor string that gets printed to extra Python information and makes the actual param-
standard out. We signal the error by returning NULL to eter available in it's "bare" C (or C++) form - a plain
our caller, which the interpreter handles as a exception. integer, or a plain null-terminated string. This function
Once we've successfully parsed the argument, it's time becomes especially important when we start passing Py-
to do the actual work. Determining if a number is even thon tuples, lists, and dictionaries to our C functions -
is a simple matter, so we perform the test and store the we need that function to help us tell the C DSO (dynamic
result that we'll want to hand back to the calling Python shared object) how to translate the list into an array,
interpreter in the returnValue variable. or the dictionary into a struct, or how to simply disas-
The last line of the function is also very Python-esque. semble the tuple into it's component strings, integers,
We want to return a Python object, not a simple integer, and floating point numbers.
so we have to create that object via the Py_BuildValue()
function. The first parameter is the format of the object,
again a single integer in this case, then we see a list of Aside Two: How Does Py_
sufficient variables to build the described object. This BuildValue Do That?
works much like a printf() or scanf() call - the number
of parameters after the object structure must be exactly The Py_BuildValue() function is a fairly complex beast -
equal to the number of items within the object struc- it has to convert C or C++ objects into Python objects.
ture. In the Python source code, this routine works by call-

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


ing a series of helper routines - it takes the parameters
passed and uses them as a format string and value list as
Aside: Py Arg Parse What? mentioned previously. It then looks at each item of the
Passing parameters to functions in a Python script is format string, creates a PyObject from the correspond-
done in the same way as in a C program - in C, you could ing item on the value list, and appends it to the main
have something like this: PyObject being built. It does so recursively over the
format string, figuring out what sort of object to build,
val = getFuncVal(42); and building it. The critical function at the bottom of
the recursion stack that gets called for each "single-
This would be passing the integer value "42" to the
ton" member of the object being built, is do_mkvalue().
function getFuncVal, and returning another value to be
This function is effectively a gargantuan switch state-
placed in the variable 'val'. Likewise, we see virtually
ment which decides what low-level converter function
the same call in Python, with the exception of the semi-
to use at a given point in the format string, such as
colon:
PyString_FromStringAndSize(), PyFloat_FromDouble(),
val = getFuncVal(42) or PyInt_FromLong(). This isn't an exhastive list of the
low-level converter functions by any means - that list
So why, exactly, do we need to parse a tuple when we can be found in the online Python documentation.
want to look at parameters in our new function? Aren't Each of those low-level converter functions, in turn,
we just passing integers and strings? takes the original object, initializes a Python object,
As it turns out, we're passing in something a bit more copies the original item's value into the new Python ob-
complex - we're passing in a Python object, which has ject's value, increments the reference count on the new
Python information wrapped around the actual integer object, and then returns the new PyObject to the caller.
value we want to work with. In the case of integers, These low-level converter functions are also accessible
the extra information is limited to a reference count, from your module, if you wish to simply return a single
which the Python interpreter uses to determine when value and not use the Py_BuildValue() call.
to garbage collect a given object. If the reference count
is 1 or greater, the object is considered in use and not
garbage collected. If the reference count is 0, it is con- A More Complex Function - Or
sidered 'free' and is garbage collected, and the memory it
was using is returned to the interpreter for later use. In
Two
the case of strings, not only is there a reference count, In listing 2, we see the definition of another basic math
but there's also a string length. Unlike C, Python strings function, one that takes a single number and returns
are not null-terminated, so without that string length the factorial of that number. The actual function call
information there's no reliable way to determine where available to the Python interpreter is just as simple as
the string is supposed to end. our first function, but instead of a single yes/no test, we
The PyArg_ParseTuple() function is what strips off the see a call to a helper function. We also see a return us-

36 • Python Magazine • October 2007


F E AT U R E Extending Python: Using C to Make Python Smarter

ing PyInt_FromLong() versus Py_BuildValue(), but that Each entry in this table consists of four items. The
is merely a cosmetic difference as we've seen above. first is the name of the function as it wil be called inside
The important question here is why we used a helper the Python interpreter. The second is the name of the
function for a recursive call versus simply re-calling the C function as defined in the C source for the module.
getFactorial() function. The answer is remarkably sim- The third entry tells us how parameters will be passed
ple - re-calling getFactorial would involve creating new - the possible values are METH_VARARGS, METH_KEY-
Python objects from the interim results prior to each WORDS, METH_VARARGS | METH_KEYWORDS, and 0. You
recursive call, and would also involve all the additional should always use METH_VARARGS or METH_VARARGS |
computation and memory overhead of storing and pars- METH_KEYWORDS unless you really know what you're do-
ing those Python objects. Since we don't want to waste ing. The fourth parameter is simply a description of the
valuable computing resources, we simply made a helper function.
function that deal with the C objects and variables na-
tively.
Aside: Varargs? Keywords?
One Plus One Equals... Whazzat?
Now we have to tell the main Python interpreter what When determining how to pass parameters to your mod-
our module can do. We do this by creating a method ule functions, you will most often use the METH_VA-

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


definition table, which is exactly what the name would RARGS flag in the function table. This means that the
seem to imply - a table listing all the methods that we parameters will be passed as a Python tuple, which can
want to make available to the interpreter. be parsed with the PyArg_ParseTuple() function. A flag
The table is a static struct PyMethodDef, so for the of 0 for this parameter means that an obsolete version
previous examples to be available we would have the of the PyArg_ParseTuple() function is used - this should
following method table: be avoided if for no other reason than to ensure your
module complies with current best-practices for Python
static struct PyMethodDef testmodule_methods[] = { modules.
{ "isEven", isEven, METH_VARARGS,
"Determine odd/even of a number" }, Using the "METH_VARARGS | METH_KEYWORDS" flag
{ "getFactorial", getFactorial, METH_VARARGS, makes things much more interesting - you get vastly
"Calculate the factorial value of a number" },
{ NULL, NULL, 0, NULL } increased flexibility in how you call the module func-
}; tion, at the price of a more complex function call on the
Python side. In this case, the C function should accept
Let's look at that a bit more closely. We have two func- a third argument, again a pointer to a PyObject, which
tions we're making available, but three entries in the will be a dictionary of keywords. Additionally, you will
table. The last entry is a sentinel entry, and it must have to parse the arguments with the PyArg_ParseTu-
consist of {NULL, NULL, 0, NULL} to properly terminate pleAndKeywords() function as opposed to the simpler
the table for the Python interpreter. PyArg_ParseTuple() function.
LISTING 2
1. static char getFactorial__doc__[] = "This module takes a number as parameter and
returns the factorial of that number\n"; ... Four?!?
2.  
3. static PyObject *getFactorial(PyObject *self, PyObject *args) {
4.   int inputValue;
There's one final routine we have to write to finish out
5.   int resultValue; the module - the initialization routine. This is the only
6.  
7.   if (!PyArg_ParseTuple(args, "i", &inputValue)) { non-static function in the entire module, so it'll look a
8.     PyErr_SetString(PyExc_ValueError, "Argument parsing error");
9.     return NULL;
bit different. Let's look at it now:
10.   }
11.   PyMODINIT_FUNC inittestmodule(void) {
12.   resultValue=factorialHelper(inputValue);
PyObject *m, *d;
13.   return PyInt_FromLong(resultValue);
14. }
15.   m=Py_InitModule("testmodule", testmodule_methods);
16.  
17. int factorialHelper(int factor) { d=PyModule_GetDict(m);
18.   ErrorObject = Py_BuildValue("s",
19.   if ( factor <= 0 ) {
20.     return 0;
"testmodule module error");
21.   } PyDict_SetItemString(d, "error", ErrorObject);
22.   if ( factor == 1 ) {
23.     return 1; if (PyErr_Occurred())
24.   } Py_FatalError("Can't initialize module
25.   return factor*factorialHelper(factor-1);
testmodule!");
26. }
}

October 2007 • Python Magazine • 37


F E AT U R E Extending Python: Using C to Make Python Smarter

Okay, so there's a lot of stuff we haven't seen yet in LISTING 4


there. Basically, what we're doing is initializing the 1. #include <Python.h>
module, and handling any error that may have occurred 2. #include <unistd.h>
3. #include <stdlib.h>
(hopefully, no error occurred!). 4. #include <sys/types.h>
The PyMODINIT_FUNC is another way of saying "void" 5.  
6. static PyObject *ErrorObject;
for C, adding any special linkages required by the plat- 7.  
8. static char isEven__doc__[] = "Determines if a number is even - if \
form we're going to compile under, and in C++ making it 9.   so, retuns '1' (for TRUE), else returns 0 (for FALSE)\n";
'extern "C"'. You could probably just use "void" as the 10.  
11. static PyObject *isEven(PyObject *self, PyObject *args) {
return type of the init function, but let's be thorough 12.   int inputValue;
13.   int returnValue;
and use PyMODINIT_FUNC. The Py_InitModule takes the 14.  
name of the module, testmodule, and the method table 15.   if (!PyArg_ParseTuple(args, "i", &inputValue)) {
16.     PyErr_SetString(PyExc_ValueError, "Argument parsing error");
definition, testmodule_methods, as parameters. It does 17.     return NULL;
18.   }
all the black magic of making the member functions 19.  
available to the Python interpreter. The rest of the code 20.   if ( 0 == inputValue%2 ) {
21.     returnValue=1;
above is strictly optional, although I tend to include it 22.   } else {
23.     returnValue=0;
since I like error checking, but all it does is look to see 24.   }
if there was a problem initializing the module. 25.  
26.   return Py_BuildValue("i", returnValue);
27. }

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


28.  

Error Checking: A Closer Look 29.  


30. static char getFactorial__doc__[] = "This module takes a number \
31.   as parameter and returns the factorial of that number\n";
So you want to do the Right Thing and do error checking 32.  
33. static PyObject *getFactorial(PyObject *self, PyObject *args) {
at module intialization time. Excellent - a good habit 34.   int inputValue;
to be in. Now you're wondering just what all that extra 35.   int resultValue;
36.  
stuff is doing and when I'll get around to explaining it - 37.   if (!PyArg_ParseTuple(args, "i", &inputValue)) {
38.     PyErr_SetString(PyExc_ValueError, "Argument parsing error");
the answer to the second part is "right now". 39.     return NULL;
The first command is the call assigning a pointer to a 40.   }
41.  
PyObject; a result of calling PyModule_GetDict(). In the 42.   resultValue=factorialHelper(inputValue);
43.   return PyInt_FromLong(resultValue);
Python interpreter, each loaded module has an associ- 44. }
ated dictionary of function names and meta-information 45.  
46.  
about that module. What we're doing here is grabbing a 47. int factorialHelper(int factor) {
48.  
handle on that dictionary. Next, we build up a Python 49.   if ( factor <= 0 ) {
object using Py_BuildValue, and assign it to the Erro- 50.     return 0;
51.   }
rObject object we declared way back at the beginning 52.   if ( factor == 1 ) {
53.     return 1;
of the module. This is the object that will hold the text 54.   }
string that will get sent to the STDOUT of the interpreter 55.   return factor*factorialHelper(factor-1);
56. }
if there was an error initializing the module. 57.  
58.  
The next call associates the ErrorObject with the mod- 59. static struct PyMethodDef testmodule_methods[] = {
60.   { "isEven", isEven, METH_VARARGS, "Determine odd/even of a number" },   
61.   { "getFactorial", getFactorial, METH_VARARGS, "Calculate the \
62.   factorial value of a number" },
LISTING 3 63.   { NULL, NULL, 0, NULL } 
64. };
1.   65.  
2. static struct PyMethodDef testmodule_methods[] = { 66.  
3.   { "isEven", isEven, METH_VARARGS, "Determine odd/even of a number" },    67.  
4.   { "getFactorial", getFactorial, METH_VARARGS, "Calculate the factorial value of 68. void inittestmodule() {
a number" }, 69.   PyObject *m, *d;
5.   { NULL, NULL, 0, NULL }  70.  
6. }; 71.   m=Py_InitModule("testmodule", testmodule_methods);
7.   72.  
8.   73.   d=PyModule_GetDict(m);
9.   74.   ErrorObject = Py_BuildValue("s", "testmodule module error");
10. void inittestmodule() { 75.   PyDict_SetItemString(d, "error", ErrorObject);
11.   PyObject *m, *d; 76.  
12.   77.   if (PyErr_Occurred())
13.   m=Py_InitModule("testmodule", testmodule_methods); 78.     Py_FatalError("Can't initialize module testmodule!");
14.   79. }
15.   d=PyModule_GetDict(m); 80.  
16.   ErrorObject = Py_BuildValue("s", "testmodule module error");
17.   PyDict_SetItemString(d, "error", ErrorObject);
18.  
19.   if (PyErr_Occurred())
20.     Py_FatalError("Can't initialize module testmodule!");
21. }
22.

38 • Python Magazine • October 2007


F E AT U R E Extending Python: Using C to Make Python Smarter

ule by setting the ErrorObject to be the value of the item directory for the distribution you're using (for Fedora 7,
'error' in the module's dictionary. Since that might be this would be /usr/lib/python2.5/site-packages/), and
hard to follow (I know it was hard for me to write out), start using the module. It's just that simple!
I'll try to explain by using code-like variable representa- Of course, we first have to know how to compile the
tions. Initially, we can imagine the module dictionary .so. In it's most basic form, this is two commands - the
as being in the following form: first one compiles the source to an object (.o) file using
GCC. For our example, we would do the following:
testmodule: {
'name' => 'testmodule';
$ gcc -I /usr/include/python2.5 -c listing4.c
'size' => '4 functions';
'author' => 'jwb';
} This causes the listing4.c program to be compiled to ob-
ject format in the listing4.o file. The -I tells the com-
piler to search in /usr/include/python2.5 for included
header files, which we need in order to find the Python.h
"You should always name file and include it's definitions. The second command
turns it into a shared object suitable for dynamic load-
ing via a dlopen() call. Again with our example, we do
your routines something the following:

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


$ ld -shared -lpython2.5 listing4.o -o listing4.so

descriptive - this is just as The -shared option tells the linker to create a shared li-
brary as opposed to an executable, the -lpython2.5 tells

true in C as in Python." the linker to also link in the libpython2.5.so shared li-
brary, and the -o tells the linker what filename to write
- the default is a.out, which is usually not ideal.
Once you have that .so, that's what you drop into
The actual dictionary wouldn't look anything like that, the /usr/lib/python2.5/site-packages directory. Using
but that will serve the purposes of this illustration. Af- autotools, or even just an RPM spec file, will involve
ter we returned from the PyDict_SetItemString() call, a slightly more complex compilation process, but ulti-
our dictionary would look like this: mately the added complexity is just window dressing to
what really needs to happen.
testmodule: {
'name' => 'testmodule';
'size' => '4 functions';
'author' => 'jwb';
'error' => ErrorObject;
}

Once we've associated the ErrorObject with the module


dictionary, we simply check to see if the Py_InitMod-
ule() call generated an error. To do so, we call the Py-
Err_Occurred() function, which returns zero if no error
occurred, and non-zero if there was an error. If there
was, we issue a call to Py_FatalError(), which causes
the interpreter to remove the module from it's current
namespace and prints the message we passed to that
function along with the error string we associated with
the ErrorObject.

Mix thoroughly, bake at 350, John Berninger is a senior linux systems administrator at
allow to cool, and serve Gilbarco Veeder-Root in Greensboro, NC. He's been doing linux
and unix for far too long to want to be reminded of that number
Now we just have to put all the pieces together into a of years, including serving hard time as a Red Hat Consultant
single file such as in Listing 4. Once this is done, we can on Wall Street. He enjoys getting away from computers via
compile the module into a .so, drop it into the proper photography and SCUBA diving.

October 2007 • Python Magazine • 39


CO L U MN

Welcome to Python
Elegant XML parsing using the
ElementTree Module
by Mark Mruss

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


XML is everywhere. It seems you can't do much these days unless you utilize
XML in one way or another. Fortunately, Python developers have a new tool in
their standard arsenal: the ElementTree module. This article aims to introduce
you to reading, writing, saving, and loading XML using the ElementTree
module.

A
lmost everyone needs to parse XML these days.
They're either saving their own information in XML REQUIREMENTS
or loading someone else's data. This is why I was
glad to learn that as of Python 2.5, the ElementTree XML
PYTHON: 2.2+
package has been added to the standard library in the
XML module.
What I like about the ElementTree module is that Other Software: Python 2.5 or ElementTree Module
it just seems to make sense. This might seem like a
strange thing to say about an XML module, but I've had
to parse enough XML in my time to know that if an XML Useful/Related Links: •
module makes sense the first time you use it, it's prob- • http://effbot.org/zone/element-index.htm
ably a keeper. The ElementTree module allows me to work
• http://effbot.org/zone/element-index.htm#installation
with XML data in a way that is similar to how I think
about XML data. • http://docs.python.org/dev/whatsnew/whatsnew25.
A subset of the full ElementTree module is available html
in the Python 2.5 standard library as xml.etree, but you • http://effbot.org/zone/pythondoc-elementtree-Ele-
don't have to use Python 2.5 in order to use the El- mentTree.htm#elementtree.ElementTree.XML-function
ementTree module. If you are still using an older version
• http://effbot.org/zone/pythondoc-elementtree-Ele-
of Python (1.5.2 or later) you can simply download the
mentTree.htm#elementtree.ElementTree.parse-function
module from its website and manually install it on your
system. The website also has very easy to follow instal- • http://docs.python.org/lib/module-xml.etree.Element-
lation instructions, which you should consult to avoid Tree.html
issues while installing ElementTree.
In general, the ElementTree module treats XML data
as a list of lists. All XML has a root element with zero

40 • Python Magazine • October 2007


CO L U MN Elegant XML parsing using the ElementTree Module

from elementtree import ElementTree as ET


or more child elements. Each of those subelements may
in turn have subelements of their own. Let's look at a This will import the ElementTree section of the mod-
brief example. ule into your program aliased as ET. However, you
Here's a look at some sample XML data: don't have to import ElementTree using an alias; you
<root>
can simply import it and access it as ElementTree. Us-
<child>One</child> ing ET is demonstrated in the Python 2.5 "What's new"
<child>Two</child>
</root>
documentation[1] and I think it's a great way to elimi-
nate some key strokes.
Here we have a root element with two child elements. Now we'll begin writing code in the main method. The
Each child element has some text associated with it, first step is to load the XML data described above. Nor-
seen here as "One" and "Two". Visualizing the XML as a mally you will be working with a file or URL; for now we
list of lists, or a multidimensional array, you'll see that want to keep this simple and load the XML data directly
we have a "root" list, which contains a "child" list. Not from the text:
too complicated so far, is it?
element = ET.XML(
"<root><child>One</child><child>Two</child></

Reading XML data root>")

The XML function is described in the ElementTree docu-

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


Now let's use the ElementTree package to parse this XML
and print the text data associated with each child ele- mentation as follows: "Parses an XML document from
ment. To start, we'll create a Python file with the con- a string constant. This function can be used to embed
tents shown in Listing 1. "XML literals" in Python code"[2].
This is basically a template that I use for many of Be careful here! The XML function returns an Element
my simple "*.py" files. It doesn't actually do anything object, and not an ElementTree object as one might
except set up the script so that when the file is run, expect. Element objects are used to represent XML ele-
the main method will be executed. Some people like to ments, whereas the ElementTree object is used to rep-
use the Python interactive interpreter for simple hacking resent the entire XML document. Element objects may
like this. Personally, I prefer having my code stored in a represent the entire XML document if they are the root
handy file so I can make simple changes and re-run the element but will not if they are a subelement. Element-
entire script when I am just playing around. Tree objects also add "some extra support for serializa-
The first thing that we need to do in our Python code tion to and from standard XML."[3] The Element object
is import the ElementTree module: that is returned represents the <root> element in our
XML data.
from xml.etree import ElementTree as ET Thankfully, the Element object is an iterator object,
so we can use a for loop to loop through its child ele-
Note: If you are not using Python 2.5 and have installed ments:
the ElementTree module on your own, you should import
the ElementTree module as follows: for subelement in element:

This will give us all of the child elements in the root


LISTING 1 element. As mentioned earlier, each element in the XML
1. #!/usr/bin/env python tree is represented as an Element object, so as we iterate
2.  
3. def main():
through the root element's child elements we are getting
4.   pass more Element objects. Each iteration will give us the
5.  
6. if __name__ == "__main__": next child element as an Element object until there are
7.   main()
no more children left. To print out the text associated
LISTING 2 with an Element object we simply have to access the Ele-
ment object's text attribute:
1. #!/usr/bin/env python
2.  
3. from xml.etree import ElementTree as ET for subelement in element:
4.   print subelement.text
5. def main():
6.   element = ET.XML("<root><child>One</child><child>Two</child></root>")
7.   for subelement in element: To recap, have a look at the code in Listing 2. Running
8.     print subelement.text
9.   the code should produce the following output:
10. if __name__ == "__main__":
11.   # Someone is launching this directly
12.   main()
One
Two

October 2007 • Python Magazine • 41


CO L U MN Elegant XML parsing using the ElementTree Module

#create the first child <child>One</child>


If an XML element does not have any text associated child = ET.SubElement(root_element, "child")
with it, like our root element, the Element object's text
attribute will be set to None. If you want to check if an This will create a <child></child> Element that is a child
element had any text associated with it, you can do the of root_element. We then need to set the text associ-
following: ated with that element. To do this we use the same
text attribute that we used in the first parsing example.
if element.text is not None:
print element.text However, instead of simply reading the text attribute we
set its value:
Reading XML Attributes child.text = "One"

Let's alter the XML that we are working with to add at-
The second approach to creating a child element is to
tributes to the elements and look at how we would parse
create an Element object separately (rather than a sub
that information.
element) and append it to a parent Element object. The
If the XML uses attributes in addition to (or instead
results are exactly the same - this is simply a different
of) inner text, they can be accessed using the Element
approach that may come in handy when creating your
object's attrib attribute. The attrib attribute is a Python
XML, or working with two sets of XML data.
dictionary and is relatively easy to use:
First we create an Element object in the same way that

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


def main(): we created the root element:
element = ET.XML(
'<root><child val="One"/><child val="Two"/></ #create the second child <child>Two</child>
root>') child = ET.Element("child")
for subelement in element: child.text = "Two"
print subelement.attrib

This creates the child Element object and sets its text to
When you run the code you get the following output:
"Two". We then append it to the root element:
{'val': 'One'}
{'val': 'Two'} #now append
root_element.append(child)
These are the attributes for each child element stored in
Pretty simple! Now, if we want to look at the contents
a dictionary. Being able to work with an XML element's
attributes as a Python dictionary is a great feature and of our root_element (or any other Element object for that
fits well with the dynamic nature of XML attributes. matter) we can use the handy tostring function. It does
exactly what its name suggests: it converts an Element
object into a human readable string.
Writing XML
#Let's see the results
Now that we've tried our hand at reading XML, let's try print ET.tostring(root_element)
creating some. If you understand the reading process,
you should have no trouble understanding the creation To recap, have a look at the code in Listing 3. When you
process because it works in much the same manner. What run this code you will get the following output:
we are going to do in this example is recreate the XML
<root><child>One</child><child>Two</child></root>
data that we were working with above.
The first step is to create our <root> element:

#create the root <root>


Writing XML attributes
root_element = ET.Element("root") If you want to create the XML with attributes (as illus-
trated in the second reading example), you can use the
After this code is executed, the variable root_element is
Element object's set method. To add the val attribute to
an Element object, just like the Element objects that we
the first element, use the following:
used earlier to parse the XML.
The next step is to create the two child elements. child.set("val","One")
There are two ways to do this.
In the first method, if you know exactly what you You may also set attributes when you create Element
are creating, it's easiest to use the SubElement method, objects:
which creates an Element object that is a subelement (or
child = ET.Element("child", val="One")
child) of another Element object:

42 • Python Magazine • October 2007


CO L U MN Elegant XML parsing using the ElementTree Module

Reading XML files However, since we only want the directory name (not
the full path and filename of our Python source file) we
Many times you won't be working with XML data that you have to strip off the filename:
explicitly create in your code. Instead, you will usually
read the XML data in from a data source, work with it, xml_file = os.path.dirname(xml_file)
and then save it back out when you are done. Fortu-
nately, configuring ElementTree to work with different Now that we have the directory in which the our.xml file
data sources is very easy. For example, let's take the resides, all we have to do is append the our.xml filename
XML data that we first used and save it to a file named to the xml_file variable. However, instead of just doing
our.xml in the same location as our Python file. something like:
There are a few methods that we can use to load XML
xml_file += "/our.xml"
data from a file. We are going to use the parse func-
tion. This function is nice because it will accept, as a we will use the os module to join the two paths so that
parameter, the path to a file or a "file-like" object. The the resulting path is always correct regardless of what
term "file-like" is used on purpose because the object operating system our code is executed on:
does not have to be a file object per se - it simply has
to be an object that behaves in a file-like manner. A xml_file = os.path.join(xml_file, "our.xml")
"file-like" object is an object that implements a "file-
Note: If you have any trouble understanding what any of

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


like" interface, meaning that it shares many (if not all)
methods with the file object. If an object is "file-like" the code used to determine the path of our.xml is doing,
this fact will usually be prominently mentioned in its try printing out xml_file after each of the above lines and
documentation. it should become clear.
The first thing that we need in order to load the XML We now have the full path to the our.xml file. In order
data is to determine the full path to the our.xml file. In to load its XML data we simply pass the path to the parse
order to calculate this, we determine the full path of our function:
Python source file, strip the filename from it, and then
tree = ET.parse(xml_file)
append our.xml to the path. This is rather simple given
that the __file__ attribute (available in Python 2.2 and We now have an ElementTree object instance that repre-
later) is the relative path and filename of our Python sents our XML file.
source file. Although the __file__ attribute will be a rela- Since we are working with files, we should watch out
tive path, we can use it to calculate the absolute path for incorrect paths, I/O errors, or the parse function fail-
using the standard os module: ing for any other reason. If you wish to be extra careful,
you can wrap the parse function in a try/except block in
import os
order to catch any exceptions that may be thrown:
We then call the abspath function to get the absolute
try:
path: tree = ET.parse("sar")
except Exception, inst:
xml_file = os.path.abspath(__file__) print "Unexpected error opening %s: %s" % (xml_
file, inst)
return

LISTING 3 In the except block, I catch the Exception base class so


1. #!/usr/bin/env python that I catch any and all exceptions that may be thrown
2.  
3. from xml.etree import ElementTree as ET (in the case of a missing file it will most likely be an
4.  
5. def main():
IOError exception).
6.   #create the root <root>
7.   root_element = ET.Element("root")
8.   #create the first child <child>One</child>
9.   child = ET.SubElement(root_element, "child")
10.   child.text = "One"
Writing XML data to a file
11.   #create the second child <child>Two</child>
12.   child = ET.Element("child")
Now that we know how to read in XML data, we should
13.   child.text = "Two" look at how one writes XML data out to a file. Let's as-
14.   #now append
15.   root_element.append(child) sume that after reading in the our.xml file we want to
16.   #Let's see the results
17.   print ET.tostring(root_element)
add another item to the XML file that we just read in:
18.  
19. if __name__ == "__main__": child = ET.SubElement(tree.getroot(), "child")
20.   # Someone is launching this directly
21.   main()
child.text = "Three"

October 2007 • Python Magazine • 43


CO L U MN Elegant XML parsing using the ElementTree Module

<child>Three</child>
Notice that in order to add a child to the root element </root>
we used the ElementTree object's getroot function. The
getroot function simply returns the root Element object
of the XML data. Reading from the Web
Now that we have a third child element, let's write the
Working with a local file is very useful, but you might
XML data back out to our.xml. Thanks to ElementTree this
also be in a situation where you will have to work with
is a painless experience:
an XML file that is located on the Internet, perhaps an
tree.write(xml_file) RSS feed. Fortunately, since the parse function ex-
plained above works with file-like elements, loading a
That's it! URL is very easy.
If we want to be really careful when writing the XML First off, you need to import the urllib module. It's
data out to a file, we'll watch out for exceptions. Howev- a standard module that allows you to open URLs in a
er most of the time the write method will succeed with- method similar to opening local files:
out throwing an exception; it is more important to be import urllib

sure that the path used is correct. Often times, instead


In order to open a URL we use:
of getting the exception that you want, you end up with
an XML file stored in some far off and strange location feed = urllib.urlopen("http://pythonmagazine.com/c/
on your hard drive because your path was incorrect or news/atom")

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


tree = ET.parse(feed)
you did not specify the full path. But, as is often the
case when programming, better safe than sorry: And that's that! This concludes our brief introduction
to XML parsing using the ElementTree module. Hopefully
try:
tree.write(xml_file) throughout this article you have seen how easy it is to
except Exception, inst: create and manipulate XML using ElementTree ...and I've
print "Unexpected error writing to file %s: %s" %
(xml_file, inst) only scratched the surface. For more information take a
return look at the official Python documentation and some of
the great examples on the effbot website. I'm sure you'll
To recap you can find all of the code from this section be an XML wizard in no time.
in Listing 4. When you run the code and look at the our.
xml file you should see that the the third child element
has been added:

<root>
<child>One</child>
<child>Two</child>

LISTING 4

!
1. #!/usr/bin/env python
FOOTNOTES
2.   [1] http://docs.python.org/whatsnew/modules.html#SE
3. from xml.etree import ElementTree as ET
4. import os
CTION0001420000000000000000
5.   [2] http://effbot.org/zone/pythondoc-elementtree-
6. def main():
7.  
ElementTree.htm#elementtree.ElementTree.XML-function
8.   xml_file = os.path.abspath(__file__) [3] http://effbot.org/zone/pythondoc-elementtree-
9.   xml_file = os.path.dirname(xml_file)
10.   xml_file = os.path.join(xml_file, "our.xml")
ElementTree.htm#elementtree.ElementTree.ElementTree-
11.   class
12.   try:
13.     tree = ET.parse(xml_file)
14.   except Exception, inst:
15.     print "Unexpected error opening %s: %s" % (xml_file, inst)
16.     return
17.  
18.   child = ET.SubElement(tree.getroot(), "child") For the last seven years
19.   child.text = "Three" Mark Mruss has worked as a software
20.  
21.   try: developer, programming in the much
22.     tree.write(xml_file) maligned C++. In 2005 Mark decided
23.   except Exception, inst:
24.     print "Unexpected error writing to file %s: %s" % (xml_file, inst) it was time to add another language
25.     return to his arsenal. After reading Eric
26.  
27. if __name__ == "__main__": Raymond's well known article "Why Python?" he set his
28.   # Someone is launching this directly sights on the inviting world of Python.
29.   main()

44 • Python Magazine • October 2007


CO L U MN

Random Hits
The Python Community

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


by Steve Holden

I
I suppose this focus on community has to an extent
've always been fairly community- structured my career, such as it has been. If anyone can
minded. In the 1970s I was Treasurer claim to have started PyCON I suppose it's me, and the
primary impetus behind the action was my attendance
of DECUS in the UK, and in the 1980s I at my first International Python Conference. This was
was Chairman of the Sun UK User Group. a typical commercial affair costing around six hundred
dollars (plus travel and hotel for those who weren't local
I accepted those positions because of to the event), and my initial response to it was "I bet
a belief in the value of communities there are a lot of people who would like to go to Python
bound by a common interest in solving conferences but can't afford this".
So I became more involved with the affairs of the Py-
problems using specific technologies. thon Software Foundation and then Guido van Rossum
This might seem a bit dangerous - the (the inventor of Python) asked me to chair the first Py-
CON in 2003. We could have gone in for extensive plan-
old saying that if the only tool you ning sessions, but we might still be planning the first
have is a hammer then all problems PyCon had we done that - no "big design up front" for
look like nails is very true, but the the agile community! As Win Borden wrote, "If you wait
to do everything until you're sure it's right, you'll prob-
technologies I have been interested in ably never do much of anything."
all my professional life are much more That first PyCon had an atmosphere I shall never for-
get. It was almost as though the convicts had taken
flexible than hammers. Which can be over the prison, and people were alight with the tangible
a good thing or a bad thing: there are sense of new possibilties. This was inevitably followed
many different types of nail too. by PyCON DC 2004 and 2005, and now we've had PyCon

45 • Python Magazine • October 2007


CO L U MN The Python Community

TX 2006 and 2007, with a change of venue as we were further demonstration of the effectiveness of the open
victims of our own success: we attracted around 250 source approach, and PyCon planning has always been a
people in the first year, and by the third year had clearly fairly open process.
outgrown our original home at George Washington Uni- How is PyCon "better" than the old International Py-
veristy. The delegate count was almost 600 in 2007, so thon Conference? Well, for a start, it is way more afford-
by most reasonable standards I guess the idea can be able. Although I have at times worked in the proprietary
considered a success. systems world I have never felt entirely comfortable

"There wasn't much of a history of non-commercial


Python conferences, so people didn't really know what
to expect..."

Licensed to 54222 - mehmet yalcin (mehmet2193@yahoo.com)


After two years I decided it was time to give up the with the idea that you should sell your products for the
PyCon chair. I believe there is a danger that these things maximum possible amount. In the world of open source
can become personal fiefdoms, which leads to stagna- where a lot of people aren't in it for the money, high
tion and loses the delightful spontaneity. I had started prices can exclude the best talent. That isn't really in
to feel a little uncomfortable because there were signs anyone's interest. The initial ethic was that everyone
that, while "the Python community" enjoyed these con- would pay the same, and even as Chairman I cheerfully
ferences, there were many delegates who would have forked over my registration fee.
been prepared to help (and indeed who would have loved More than that, though, I think that PyCon is more
to help) but whose skills and energy weren't tapped for inclusive, allowing a wider range of contributions and
one reason or another. a broader perspective of what Python is actually being
Part of the problem was that things hadn't been ter- used for. I hope in the long term that will be good for
ribly organized. There wasn't much of a history of non- Python's development, and will help to keep the devel-
commercial Python conferences, so people didn't really opers in touch with their user base. This will in turn
know what to expect, and I deliberately took a some- maintain Python's relevance to contemporary problems.
what freewheeling approach rather than try to stamp Now the attendance has grown I am interested to see
too many of my own ideas on the nascent conference. that the organizers are starting to talk about using con-
Community events can tend towards chaos, but over the ference funds to help those who make a positive con-
first three years PyCon delegates seemed to have been tribution (particularly speakers) to attend, and to offer
empowered enough to use the PyCon to get together and commercial delegates the opportunity to pay a higher
talk about topics of mutual interest. fee. Believe it or not PyCon's low price can act as a dis-
After my third year I managed to pass on the torch to incentive, and some people have difficulty persuading
someone else. Andrew Kuchling, assisted by Jeff Rush, their corporate sponsors that a sub-$200 conference can
brought a more organized approach to the event and be worthwhile.
managed to bring in more volunteers as we moved to
Texas for 2006 and 2007. David Goodger will be chairing
the 2008 event in Chicago and as PyCon enters its sixth
year it appears to be *the* established Python commu-
nity event.
I believe the main achievement of my three years as Steve Holden is a consultant, instructor
chair was getting the Python community to realize that and author active in networking and
it can organize better conferences than professional security technologies. He is Director of
conference organizers can. This is a practical demon- the Python Software Foundation and a
recipient of the Frank Willison Memorial
stration of the truth that individuals can and do make
Award for services to the Python
a difference, which goes hand in hand with my "roll up
community.
your sleeves and get on with it" philosophy. It is also a

46 • Python Magazine • October 2007

You might also like