Professional Documents
Culture Documents
mehmet yalcin
mehmet2193@yahoo.com
User #54222
CONTENTS
FEATURES
9 Creating custom 18 Working with IMAP
PyGTK widgets with and iCalendar
Cairo Doug Hellmann
Sayamindu Dasgupta
Integrate iCalendar functionality into
your Exchange-like groupware service.
Empower your users with the perfect
widget.
COLUMNS
3| Import This
Welcome to the debut of Python Magazine 40| Welcome to Python
XML Processing with the (now built-in!)
ElementTree module
Publisher
Marco Tabini
W
elcome to the premier issue of Python Magazine. Projects like a new
magazine tend to feel like a huge party. You put all kinds of time and
Editor-in-Chief
effort into it, you call decorators, caterers, chair rentals, balloon blow-
ers, clowns, and you get everything coordinated. Then, once everything is in Brian Jones
place, you pray that people actually show up.
I'm excited (and relieved!) to see that the community has figured out a Technical Editor
couple of things that have made them willing to embrace the idea of a maga- Doug Hellmann
zine devoted to Python.
First, we've done this before. Python Magazine is not the first magazine I've
Contributing Editor
helped launch, nor is it the first one that MTA (the publisher) has launched. We
are already intimately familiar with the problems inherent in trying to produce Steve Holden
a magazine that is timely, accurate, thorough, and even entertaining, on a
I guess I'm a bit of a workaholic, maybe. Truth is, I'm an infrastructure services
Authors
architect (a fancy sort of sysadmin) by day. Part of my job is to write code to do
various things and touch various services that I maintain. I had wanted to give John Berninger, Sayamindu Dasgupta,
Python a try for a while, and an opportunity presented itself. I took the plunge, Doug Hellmann, Steve Holden,
and fell in love. However, I found that one of my favorite learning resources Mark Mruss, Kevin T. Ryan
was unavailable in the Python world: the venerable how-to magazine!
These magazines exist for lots of other topics. There are magazines that'll tell
you how to brew beer, how to work with wood, how to play pool, how to cook,
Python Magazine is published twelve times a year
how to stay in shape, how to take pictures, how to write, and how to use your by Marco Tabini & Associates, Inc., 28 Bombay
computer in various ways (or how to use various computers in one particular Ave., Toronto, ON M3H1B7, Canada.
way, as the case may be). Heck, there's even a magazine on how to code in PHP!
For crying out loud, where's the Python mag?! Although all possible care has been placed in
assuring the accuracy of the contents of this
There wasn't one. It wasn't for lack of trying. Various attempts had failed to
magazine, including all associated source code,
gain momentum for whatever reason, but I wasn't going to let something silly listings and figures, the publisher assumes
like a path littered with the corpses of past failed attempts get in the way of no responsibilities with regards of use of the
having a magazine I could read to glean inspiration and knowledge from about information contained herein or in all associated
my new favorite programming language! material.
And so, I went to the publisher - the one I had worked with on php|architect.
Python Magazine, PyMag the Python Magazine
I told him that I wanted to learn Python better, and so did lots of other people,
logo, Marco Tabini & Associates, Inc. and the Mta
and that there was no magazine for them, and there was no magazine for me. Logo are trademarks of Marco Tabini & Associates,
I told him that people who already knew Python didn't know everything they Inc.
wanted to know, and there was no magazine for them, either. I shed a tear for
effect. Now, here we are, only 4 months after the initial "ok, go for it!", and For all enquiries, visit
So, in a way, it's happening because I want to know Python better. But it's
Printed in Canada
Copyright © 2003-2007
Marco Tabini & Associates, Inc.
3 • Python Magazine • October 2007 All Rights Reserved
E D ITO R IAL Import This
also happening because there are lots of people at vari- Meet the Rest of the Staff at
ous experience levels who would like to know how to do
something new, or something old in a better way, with PyMag
Python. It's happening because the community wants it
to happen as well. I got lots of positive feedback from
various IRC channels, emails to community members,
and even Guido himself.
Doug Hellmann is a Senior Software
The Premier Issue Engineer at Racemi. He has been
programming in Python since version
This first issue dives right in, and is made to look like 1.4 on a variety of Unix and non-Unix
we've done this before, rather than spending time dwell- platforms. He has worked on projects
ing on the fact that this is issue 1 of a new magazine. ranging from mapping to medical news
It's less interesting to people who've shown up for the publishing, with a little banking thrown in for good measure.
code to hear us big-headed editors patting ourselves on
the back.
However, being that this is a highly specialized maga-
zine, it's likely that I have a built-in, captive audience of
And Now
For So
methin
g
Completely
re n t
Diffe
Has your multi-threaded application grown GILs? Take a look at these packages
for easy-to-use process management and interprocess communication tools.
T
here is no predefined theme for this column, so I
plan to cover a different, likely unrelated, subject REQUIREMENTS
every month. The topics will range anywhere from
open source packages in the Python Package Index (for-
PYTHON: 2.5
merly The Cheese Shop, now PyPI) to new developments
from around the Python community, and anything that
looks interesting in between. If there is something you Other Software:
would like for me to cover, send a note with the details • Richard Oudkerk's "processing" package version
to doug.hellmann@pythonmagazine.com and let me know, 0.34 or higher http:pypi.python.org/pypi/processing
or add the link to your del.icio.us account with the tag
"pymagdifferent". • Vitalii Vanovschi's "parallel python" package
I will make one stipulation for my own sake: any open http:www.parallelpython.com/
source libraries must be registered with PyPI and config-
ured so that I can install them with distutils. Creating a Useful/Related Links:
login at http://pypi.python.org/ and registering your proj- • "It isn't Easy to Remove the GIL"
ect is easy, and only takes a few minutes. Go on, you http:www.artima.com/weblogs/viewpost.
know you want to. jsp?thread=214235
• "Can't we get rid of the Global Interpreter
Scaling Python: Threads vs. Lock?"
Processes http:www.python.org/doc/faq/library/#can-t-we-get-
rid-of-the-global-interpreter-lock
In the ongoing discussion of performance and scaling is-
sues with Python, one persistent theme is the Global In-
terpreter Lock (GIL). While the GIL has the advantage of nounced any plans to do so.
simplifying the implementation of CPython internals and Even though there is a FAQ entry on the subject as
extension modules, it prevents users from achieving true part of the standard documentation set for Python, from
multi-threaded parallelism by limiting the interpreter to time to time a request pops up on comp.lang.python or
executing byte-codes in one thread at a time on a single one of the Python-related mailing lists to rewrite the
processor. Threads which block on I/O or use extension interpreter so the lock can be removed. Each time it
modules written in another language can release the GIL happens, the answer is clear: use processes instead of
to allow other threads to take over control, of course. threads.
But if my application is written entirely in Python, only That response does have some merit. Extension mod-
a limited number of statements will be executed before ules become more complicated without the safety of the
one thread is suspended and another is started. GIL. Processes typically have fewer inherent deadlock-
ing issues than threads. They can be distributed be-
tween the CPUs on a host, and even more importantly,
an application that uses multiple processes is not lim-
"Parallel Python is ited by the size of a single server, as a multi-threaded
application would be.
impressive, but it is
Since the GIL is still present in Python 3.0, it seems
unlikely that it will be removed from a future version
not the only option for is not the end of the world. There are, after all, strate-
gies for working with multiple processes to scale large
applications. I'm not talking about the well worn, es-
managing parallel jobs." tablished techniques from the last millennium that use
a different collection of tools on every platform, nor the
time-consuming and error-prone practices that lead to
solving the same problem time and again. Techniques
Eliminating the GIL has been on the wish lists of many using low-level, operating system-specific, libraries for
Python developers for a long time – I have been work- process management are as passe as using compiled lan-
ing with Python since 1998 and it was a hotly debated guages for CGI programming. I don't have time for this
topic even then. Around that time, Greg Stein produced low-level stuff any more, and neither do you. Let's look
a set of patches for Python 1.5 that eliminated the GIL at some modern alternatives.
entirely, replacing it with a whole set of individual locks
for the mutable data structures (dictionaries, lists, etc.)
that had been protected by the GIL. The result was an The subprocess module
interpreter that ran at roughly half the normal speed, Version 2.4 of Python introduced the subprocess module
a side-effect of acquiring and releasing the individual and finally unified the disparate process management
locks used to replace the GIL. interfaces available in other standard library packages to
The GIL issue is unique to the C implementation of provide cross-platform support for creating new process-
the interpreter. The Java implementation of Python, es. While subprocess solved some of my process creation
Jython, supports true threading by taking advantage of problems, it still primarily relies on pipes for interpro-
the underlying JVM. The IronPython port, running on cess communication. Pipes are workable, but fairly low-
Microsoft's CLR, also has better threading. On the other level as far as communication channels go, and using
hand, those platforms are always playing catch-up with them for two-way message passing while avoiding I/O
new language or library features, so if you're hot to use deadlocks can be tricky (don't forget to flush()). Pass-
the latest and greatest, like I am, the C reference-imple- ing data through pipes is definitely not as transparent
mentation is still your best option. to the application developer as sharing objects natively
Dropping the GIL from the C implementation remains between threads. And pipes don't help when the pro-
a low priority for a variety of reasons. The scope of the cesses need to scale beyond a single server.
changes involved is beyond the level of anything the
current developers are interested in tackling. Recently,
Guido has said he would entertain patches contributed Parallel Python
by the Python community to remove the GIL, as long Vitalii Vanovschi's Parallel Python package (pp) is a
as performance of single-threaded applications was not more complete distributed processing package that
adversely affected. As far as I know, no one has an- takes a centralized approach. Jobs are managed from
a "job server, and pushed out to individual processing The package hides most of the details of selecting
"nodes". an appropriate communication technique for the plat-
Those worker nodes are separate processes, and can be form by choosing reasonable default behaviors at run-
running on the same server or other servers accessible time. The API does include a way to explicitly select
over the network. And when I say that pp pushes jobs the communication mechanism, in case I need that level
out to the processing nodes, I mean just that – the code of control to meet specific performance or compatibility
and data are both distributed from the central server to requirements. As a result, I end up with the best of both
the remote worker node when the job starts. I don't worlds: usable default settings that I can tweak later to
even have to install my application code on each ma- improve performance.
chine that will run the jobs. To make life even easier, the processing.Process class
Here's an example, taken right from the Parallel Py- was purposely designed to match the threading.Thread
thon Quick Start guide: class API. Since the processing package is almost a
drop-in replacement for the standard library's threading
import pp module, many of my existing multi-threaded applications
job_server = pp.Server()
# Start tasks can be converted to use processes simply by changing a
f1 = job_server.submit(func1, args1, depfuncs1, few import statements. That's the sort of upgrade path
modules1)
f2 = job_server.submit(func1, args2, depfuncs1,
I like.
modules1) Listing 1 contains a simple example, based on the ex-
sizes. LocalManager is interesting, but it's not what has especially since I don't have to decide up front what
me excited. The SyncManager is the real story. information to share or how big the values can be. Any
SyncManager implements tools for synchronizing in- process can change existing values or add new values to
terprocess communication in the style of threaded pro- the namespace, as illustrated in Listing 3. Changes to
gramming. Locks, semaphores, condition variables, and the contents of the namespace are reflected in the other
events are all there. Special implementations of Queue, processes the next time the values are accessed.
dict, and list that can be used between processes safe-
ly are included as well (Listing 2). Since I'm already
comfortable with these APIs, there is almost no learn- Remote Servers
ing curve for converting to the versions provided by the Configuring a SyncManager to listen on a network socket
processing module. gives me even more interesting options. I can start pro-
For basic state sharing with SyncManager, using cesses on separate hosts, and they can share data using
a Namespace is about as simple as I could hope. A all of the same high-level mechanisms described above.
namespace can hold arbitrary attributes, and any attri- Once they are connected, there is no difference in the
bute attached to a namespace instance is available in all way the client programs use the shared resources re-
client processes which have a proxy for that namespace. motely or locally.
That's extremely useful for sharing status information, The objects are passed between client and server us-
ing pickles, which introduces a security hole: because
PyGTK, a set of Python bindings for the popular GTK+ graphical toolkit, provides
a rich collection of commonly used windows, dialog boxes, buttons, layout
elements, and other 'widgets'. However, often a programmer has needs which
go beyond the functionality provided by the built-in widgets in PyGTK. This
article explains how to create new widgets using the Python bindings for Cairo
– the vector graphics library used by GTK+ to perform most of its drawing
operations.
About GTK+
GTK+ is one of the most popular free/open source Graph- REQUIREMENTS
ical User Interface (GUI) toolkits around. Though best
known as the basic building block of GNOME, a popular PYTHON: 2.x
free/open source desktop, GTK+ was originally written
for the GIMP image editing program (in fact, 'GTK' actu-
Other Software: PyGTK 2.10 or above
ally stands for 'Gimp Tool Kit'). Currently, apart from
its role in the GNOME and GIMP projects, it is also used
to create the GUIs for the XFCE and Rox desktops. In Useful/Related Links:
addition, it is also used in embedded devices such as http://www.pygtk.org
the Nokia N800/N770 (as a part of the Hildon desktop), http://www.cairographics.org
and the FIC Neo1973 (as a part of the OpenMoko frame-
work).
Though written in C, GTK+ supports object oriented make converting a little less straightforward, but those
features using GObject and it also has an excellent set peculiar cases are usually few and far between.
of bindings for Python, known as PyGTK. In fact, Python
is fast becoming one of the primary languages of choice
for upcoming GNOME applications, as more and more de-
How Cairo fits into GTK+
velopers grow to love the language's simplicity and ease From version 2.8 onwards, GTK+ includes Cairo support,
of use. Some of the upcoming GNOME applications writ- making it possible for developers to access the Cairo
ten in Python include Sabayon (a user profile editor), drawing API directly from within GTK+. This means that
Jokosher (a multi-track audio editor), and Pitivi (a video GTK+ developers can use Cairo to draw their widgets us-
editor), to name just a few. Apart from GTK+, all other ing the Cairo API instead of using the GDK (GTK+ Draw-
major components of the GNOME Development Platform ing Kit) drawing functions. In fact, at present, most of
have Python bindings as well, a factor that also contrib- the stock GTK+ widgets and theme engines use Cairo to
utes to the adoption of Python within GNOME. do the rendering and drawing operations.
from the coordinate points (10, 10) to (120, 130). Note current point (175, 150) to the beginning of the current
that the origin point (0,0) of the surface is the top left sub-path, defined as the point passed to the last invo-
corner. The value of the X coordinate increases as you cation of move_to(). In this case, that's (50, 10). See
move from left to right, while the Y coordinate increases Figure 2 for the output of the above code. Drawing any
downwards. other polygon is a similar process (with the exception
of a rectangle, which has its own convenience method
ctx.move_to(10, 10)
rectangle(x0, y0, width, height) where (x0, y0) is the top
ctx.line_to(120, 130)
ctx.stroke() left corner of the rectangle).
Cairo lets you draw cubic Bézier curves with the meth-
The first line (ctx.move_to(10, 10)) begins a new sub-path od curve_to(x0, y0, x1, y1, x2, y2) where (x0, y0) and (x1,
and sets the current point to (10, 10). The second line y1) are the two control points, and (x2, y2) is the point
(ctx.line_to(120, 130)) draws a line from the current point where the curve ends. To draw arcs, use the method
(10, 10) to the point (120, 130). The third line (ctx. arc(x, y, radius, angle1, angle2) where (x, y) is the center
stroke()) actually makes the line visible (you can think of the arc, radius is the radius, and angle1 and angle2 are
of the stroke() method as something that drags a virtual the starting and end angles of the arc to be drawn. The
pen over your path). However, something that should angle is measured in radians, with angle 0 signifying the
be kept in mind while using the stroke() method: the direction of the positive X-axis, and angle 90 (math.pi/2)
path information is reset after the stroke operation is represents the direction of the positive Y-axis. The angle
FIGURE 2 FIGURE 3
ranging from 0 to 1). So if you want to use the red color, custom PyGTK widget. Almost any custom widget that
you can use the method set_source_rgb(1, 0, 0) (since the is written is created as a subclass of a standard wid-
RGB value for red is 255, 0, 0). get (a gtk.TextView or a gtk.Window, for instance). So
You may also use the method set_source_rgba() to set the first line of a custom widget code looks like class
the alpha channel value for transparency. So if you want MyWidget(gtk.Window):.
fully transparent red (invisible), you would need to use The method you'll probably want to override is the
set_source_rgba(1, 0, 0, 0), for a red which is 75% opaque, do_expose_event() method, which is the event handler
you would use set_source_rgba(1, 0, 0, .75), and for a fully for an expose event. The expose event occurs when the
opaque red, you'll need to use set_source_rgba(1, 0, 0, 1). widgets that received the signal need to be redrawn for
To actually put the source on your destination surface, some reason. However, when you are writing a wid-
you will need to use the fill() method which fills up the get from scratch (a subclass of gtk.Widget), some other
area enclosed by your path with the source. So if you methods need to be taken care of. The most important
want a red triangle (Figure 3), your code would be of these methods are:
control buttons on her keyboard (Figure 4). It supports into the code. The first few lines are usual Python stuff.
a semi-transparent background, which is one of those Apart from gtk, gdk, cairo and gobject, we also import
things that has become very common and very easy to the random module as we will use this for a demo of the
use since the advent of Cairo and other Xorg technolo- widget once we are done. We will also be using a few
gies such as the COMPOSITE extension. However, it also features specific to PyGTK 2.10 and above, so we do a
gracefully degrades to a solid black background if the check for the version we are using (if gtk.pygtk_version
user is running it in an environment where translucency < (2,10,0):). We want our OsdWindow widget to be a sub-
is not possible. Note that, if you wanted, it is possible class of gtk.Window, so we declare our class with class
to make an entire gtk.Window translucent with the set_ OsdWindow(gtk.Window):).
opacity() method, but in this case, only the background
needs to be translucent. The widget also shows an icon
for a speaker and a white bar to indicate the volume Dealing with signals
level. Since the widget is a special form of GTK window
__gsignals__ = {
'expose-event': 'override',
'screen-changed': 'override',
or any of the output target the built in class-specific callbacks for expose-event and
screen-changed, so we set the value of the relevant pairs
to override.
supported by Cairo. " Note: In most of the PyGTK related documentation,
you may see the above process as referred to as "over-
riding the class closures". In very simplified terms, a
(one without any decoration, and a custom background), closure is an abstraction of the callback concept and it
we'll make it a subclass of a a gtk.Window. Since we contains, along with the callback function, related stuff
want to change the look and feel of the widget, we will such as user data supplied to the callback, etc. When a
be overriding the signal handler for the expose event signal is emitted, a series of closures are emitted, one
(the do_expose() method). The widget will also have a of them being specific to the class and hence known as
property called level which will be used to set the length the class closure.
of the volume level indicator bar. Moreover, contrary to Finally, the third signal (clicked) is something that we
the behaviour of a normal gtk.Window, our window will define on our own, and the value part of the pair is a
also emit a 'clicked' signal if someone clicks on it. This tuple containing the following members:
will allow the developer who is using the widget to close
it (or do something even fancier) if the user clicks on • gobject.SIGNAL_RUN_LAST: This is the signal
the window. flag, determining when the class closure for
the signal would be invoked. SIGNAL_RUN_LAST
A walkthrough of the code indicates that the invocation should be during
the third stage of signal emission. For more
information on this, you can check out the
import random
import pygtk
signals section of the Gobject manual at http://
pygtk.require('2.0') developer.gnome.org/doc/API/2.0/gobject/signal.
import gtk html.
from gtk import gdk
import cairo • gobject.TYPE_NONE: This signal does not return
import gobject anything, so the second value is set to gobject.
if gtk.pygtk_version < (2,10,0): TYPE_NONE
print 'PyGtk 2.10.0 or later required' • The third value is an empty tuple. This tuple is
raise SystemExit
supposed to contain all the parameters to the
So, without any more boring theory, let us dive straight signal. We have none, so the tuple is empty.
Once you define a custom signal for yourself, you can and do_set_property() be defined. These are called when-
emit the signal whenever you want with the emit() meth- ever someone tries to access these properties. For our
od (we will be coming back to this later in the code). example, the methods are described in Listing 1.
Note: When you have property names with more than
Dealing with properties one word, GObject translates the - (hyphen) to _ (under-
score) and vice versa. So a property representing a Py-
thon variable "update_speed" would be translated into
__gproperties__ = {
'level': (gobject.TYPE_FLOAT,
"update-speed" by GObject. This is something that you
'OSD level', should keep in mind while working on your code.
'value for the OSD level indicator',
0, 1, 0.5, gobject.PARAM_READWRITE)
}
The constructor
The properties of our widget are specified via the __gprop- The constuctor for our widget (Listing 2) takes the ini-
erties__ dictionary. We need to have only one property tial OSD level as an argument. It calls the constructor
called level which is specified as the key of the first (and of the widgets superclass (gtk.Window) and sets the
only) pair in the dictionary. The value is a tuple contain- type to gtk.WINDOW_POPUP so that the window man-
ing the following members: ager will not register our window. Thus the window will
remain undecorated, it will not appear in the panel and
LISTING 3 And in the final step of our constructor, we set the level
property of our widget.
1. def do_expose_event(self, event):
2. ctx = self.window.cairo_create()
LISTING 5 LISTING 5
87. height = event.area.height
1. #!/usr/bin/env python
88. x1 = x0 + width
2.
89. y1 = y0 + height
3. # Demonstration of custom PyGTK widgets
90. radius = 40
4. # Author: Sayamindu Dasgupta
91. ctx.move_to(x0, y0+radius)
5.
92. ctx.curve_to(x0, y0+radius, x0, y0, x0+radius, y0)
6. import random
93. ctx.line_to(x1-radius, y0) # Top line segment
7.
94. ctx.curve_to(x1-radius, y0, x1, y0, x1, y0+radius)
8. import pygtk
95. ctx.line_to(x1, y1-radius) # Right line segment
9. pygtk.require('2.0')
96. ctx.curve_to(x1, y1-radius, x1, y1, x1-radius, y1)
10. import gtk
97. ctx.line_to(x0+radius, y1) # Bottom line segment
11. from gtk import gdk
98. ctx.curve_to(x0+radius, y1, x0, y1, x0, y1-radius)
12. import cairo
99. ctx.close_path() # Left line segment
13. import gobject
100. if alpha:
14.
101. ctx.set_source_rgba(0, 0, 0, 0.5)
15.
102. else:
16. if gtk.pygtk_version < (2,10,0):
103. ctx.set_source_rgb(0, 0, 0)
17. print 'PyGtk 2.10.0 or later required'
104. ctx.fill()
18. raise SystemExit
105.
19.
106. x0 = event.area.width/4
20. class OsdWindow(gtk.Window):
107. y0 = event.area.height/3
21. __gsignals__ = {
108. width = event.area.width/2
22. 'expose-event': 'override',
109. height = event.area.height/2
23. 'screen-changed': 'override',
110. ctx.set_line_width(5)
24. 'clicked' : (gobject.SIGNAL_RUN_LAST,
111. ctx.move_to(x0, y0)
25. gobject.TYPE_NONE,
112. ctx.rel_line_to(width/2, 0)
26. ())
113. ctx.rel_line_to(width/3, -height/4)
27. }
(we calculate the orignating point for the rectangle, as forces the widget to redraw so that the level indicating
well as the dimensions based on the dimensions of the bar gets updated accordingly. This is done via the meth-
widget). Once the path is created, we set the source to od's invalidate_rect() and process_updates, which sends a
black with 50% transparency and call the fill() method. synthetic expose event to the widget.
Once the rectangle has been drawn, we follow a similar
pattern of calculating our coordinates based on the size
of the widget for drawing the icon for the speaker. Note And in the end, the
the use of relative coordinate versions of the drawing demonstration
methods while creating the speaker icon.
For the volume level indicator bar, we use a hack. We The demonstration (Listing 4) is fairly straightforward.
essentially draw a thick dashed line which looks like a The widget is displayed on screen, and its level prop-
bar. erty value is changed every second (via gobject.timeout_
add()) to some random value. On clicking, gtk.main_quit()
ctx.set_line_width(10) is called, which terminates the demo.
ctx.set_dash((10, 5), 0)
ctx.move_to(x0, y0)
ctx.line_to(length*self.level+x0, y0)
ctx.stroke() Final words and conclusion
This was a small demo (for the entire program's code, see
The first line sets the line width to 10 pixels. The second
I
recently needed to access shared schedule informa-
tion stored on an Exchange-like mail and calendaring REQUIREMENTS
server. In this article, I will discuss how I combined
an existing third-party open source library with the tools PYTHON: 2.x
in the Python standard library to create a command line
program called mailbox2ics for converting the calendar
Other Software: Max M's icalendar library, from
data into a format I could bring into my desktop client
directly. The final product is just under 140 lines long, http://codespeak.net/icalendar/
including command line switch handling, some error
processing, and debug statements; far shorter than I had Useful/Related Links:
anticipated. The output file produced can be consumed • Source for this program
by any scheduling client which supports the iCalendar http://www.doughellmann.com/projects/mailbox2ics/
standard. • RFC 2445 - iCalendar specification
Using Exchange, or a compatible replacement, for email http://www.ietf.org/rfc/rfc2445.txt
and scheduling makes sense for many environments. The • IMAP specification
client program, Microsoft Outlook, is usually familiar to
http://www.ietf.org/rfc/rfc3501.txt
non-technical staff members, who are able to hit the
• Python standard library imaplib documentation
ground running instead of trying to figure out how to
http://docs.python.org/lib/module-imaplib.html
accomplish their basic communication tasks. However,
my laptop runs Mac OS X and I do not have Outlook.
Purchasing a copy of Outlook at my own expense, not to
mail_server = imaplib.IMAP4_SSL(hostname)
mention inflicting further software bloat on my already mail_server.login(username, password)
crowded computer, seemed like a suboptimal solution.
Changing the server software was also not an option. It is also possible to use IMAP over a non-standard port,
A majority of the users already had Outlook and were when necessary. In that case, the caller can pass port
accustomed to using it for their scheduling, and I did as an additional option to imaplib.IMAP4_SSL(). To work
not want to have to support a different server platform. with an IMAP server without the SSL encryption layer,
What I needed, then, was a way to pull the data out of you can use the IMAP4 class, but using SSL is definitely
the existing server so I could convert it to a format that preferred whenever possible.
I could use with my usual tools: Apple's iCal and Mail.
With iCal, as with many other standards-compliant mail_server = imaplib.IMAP4_SSL(hostname, port)
mail_server.login(username, password)
calendar tools, it is possible to subscribe to calendar
data feeds. Unfortunately, the server we were using did The connection to the IMAP server is "stateful". The cli-
not have the ability to export the schedule data in a ent remembers which methods have been called on it,
standard format using a single file or URL. However, the and changes its internal state to reflect those calls. The
server did provide access to the calendar data via IMAP internal state is used to detect logical errors in the se-
using shared public folders. I decided to write a Python quence of method calls without the round-trip to the
program to extract the data from the server and convert server.
it into a usable feed. The feed could then be passed to On an IMAP server, messages are organized into "mail-
such as the headers or body. iCalendar event notification is via an email attachment.
Most standard calendaring tools, such as iCal and Out-
(typ, [message_ids]) = mail_server.search(None,
'ALL')
look, generate these email messages when you initially
message_ids = message_ids.split() "invite" another participant to a meeting, or update an
existing meeting description. The iCalendar standard
Individual messages are retrieved via fetch(). If only says the file should have filename extension ICS and
part of the message is desired (size, envelope, body), mime-type text/calendar. The input data for mailbox2ics
that part can be fetched to limit bandwidth. I could not came from email attachments of this type.
predict which subset of the message body might include The iCalendar format is text-based. A simple example
the attachments I wanted, so it was simplest for me to
of an ICS file with a single event is provided in Listing 1.
download the entire message. Calling fetch("(RFC822)")
Calendar events have properties to indicate who was in-
returns a string containing the MIME-encoded version of
vited to an event, who originated it, where and when it
the message with all headers intact.
will be held, and all of the other expected bits of infor-
typ, message_parts = mail_server.fetch( mation important for a scheduled event. Each property
message_ids[0], '(RFC822)') of the event is encoded on its own line, with long values
message_body = message_parts[0][1]
wrapped onto multiple lines in a well-defined way to al-
Once the message body had been downloaded, the next low the original content to be reconstructed by a client
receiving the iCalendar representation of the data. Some
on the email package discussed previously, so working Once you have instantiated the Calendar object, there
with Calendar instances and email.Message instances is are two different ways to iterate through its components:
similar. Use the class method Calendar.from_string() to via the walk() method or subcomponents attribute. Using
parse the text representation of the calendar data to cre- walk() will traverse the entire tree and let you process
ate a Calendar instance populated with all of the proper- each component in the tree individually. Accessing the
ties and subcomponents described in the input data. subcomponents list directly lets you work with a larger
portion of the calendar data tree at one time. Proper-
from icalendar import Calendar, Event
cal_data = Calendar.from_string( ties of an individual component, such as the summary or
open('sample.ics', 'rb').read()) start date, are accessed via the __getitem__() API, just as
LISTING 2 LISTING 2: Continued...
1. #!/usr/bin/env python 70.
2. # mailbox2ics.py 71. # Connect to the mail server
3. 72. if options.port is not None:
4. """Convert the contents of an imap mailbox to an ICS file. 73. mail_server = imaplib.IMAP4_SSL(hostname, options.port)
5. 74. else:
6. This program scans an IMAP mailbox, reads in any messages with ICS 75. mail_server = imaplib.IMAP4_SSL(hostname)
7. files attached, and merges them into a single ICS file as output. 76. (typ, [login_response]) = mail_server.login(username, password)
8. """ 77. try:
9. 78. # Process the mailboxes
10. # Import system modules 79. for mailbox in mailboxes:
with a standard Python dictionary. The property names the server via IMAP, parse each message looking for the
are not case sensitive. ICS attachments, parse them to produce another ICS
For example, to print the "SUMMARY" field values from file, and import that file into my calendar client. All
all top level events in a calendar, you would first iterate that remained was to tie the pieces together and give
over the subcomponents, then check the name attribute it a user interface. The source for the resulting program,
to determine the component type. If the type is VEVENT, mailbox2ics.py, is provided in Listing 2.
then the summary can be accessed and printed. Since I wanted to set up the export job to run on a
regular basis via cron, I chose a command line inter-
for event in cal_data.subcomponents:
if event.name == 'VEVENT':
face. The main() function for mailbox2ics.py starts out at
print 'EVENT:', event['SUMMARY'] line 24 with the usual sort of configuration for command
line option processing via the optparse module. Listing
While most of the ICS attachments in my input data 3 shows the help output produced when the program is
would be made up of one VCALENDAR componment with run with the -h option.
one VEVENT subcomponent, I did not want to require The –password option can be used to specify the IMAP
this limitation. The calendars are writable by anyone in account password on the command line, but if you
the organization, so while it was unlikely that anyone choose to use it consider the security implications of
would have added a VTODO or VJOURNAL to public data, I embedding a password in the command line for a cron
could not count on it. Checking for VEVENT as I scanned task or shell script. No matter how you specify the pass-
the rest of the option switches. Each mailbox name is of the ICS files only included one VEVENT anyway, but
processed one at a time, in the for loop starting on line I did not want to miss anything important if that ever
79. After calling select() to change the IMAP context, turned out not to be the case.
the message ids of all of the messages in the mailbox are
retrieved via a call to search(). The full content of each for event in importing.subcomponents:
if event.name != 'VEVENT':
message in the mailbox is fetched in turn, and parsed continue
with email.message_from_string(). Once the message has merged_calendar.add_component(event)
for my needs. The program runs locally on a web server are readily available through the __getitem__() API of the
which has access to the IMAP server. For better security, Calendar instance and it would be simple to compare
it connects to the IMAP server as a user with restricted them against the pattern(s).
permissions. The ICS output file produced is written to a If a large amount of data is involved, either spread
directory accessible to the web server software. This lets across several calendars or because there are a lot of
me serve the ICS file as static content on the web server events, it might also be useful to be able to update an
to multiple subscribers. Access to the file through the existing cached file, rather than building the whole ICS
web is protected by a password, to prevent unauthorized file from scratch each time. Looking only at unread mes-
access. sages in the folder, for example, would let mailbox2ics
skip downloading old events that are no longer relevant
or already appear in the local ICS file. It could then
Thoughts About Future initialize merged_calendar by reading from the local file
Enhancements before updating it with new events and rewriting the
file. Caching some of the results in this way would place
Mailbox2ics does everything I need it to do, for now. less load on the IMAP server, so the export could easily
There are a few obvious areas where it could be en-
hanced to make it more generally useful to other users
LISTING 4
with different needs, though. Input and output filtering
be run more frequently than once every 10 minutes. ect illustrates two reasons why I enjoy developing with
In addition to filtering to reduce the information in- Python. First, difficult tasks are made easier through
cluded in the output, it might also prove useful to add the power of the "batteries included" nature of Python's
extra information by including component types other standard distribution. And second, coupling Python with
than VEVENT. For example, including VTODO would allow the wide array of other open source libraries available
users to include a group action list in the group calendar. lets you get the job done, even when you encounter
Most scheduling clients support filtering the to-do items those times when the Python standard library lacks the
and alarms out of calendars to which you subscribe, so exact tool you need. Using the ICS file produced by mail-
if the values are included in a feed, individual users can box2ics, I am now able to access the calendar data I
always ignore the ones they choose. need using my familiar tools, even though iCalendar is
As mentioned earlier, using the –password option to not supported directly by the group's calendar server.
provide the password to the IMAP server is convenient,
but not secure. For example, on some systems it is pos-
sible to see the arguments to programs using ps. This al-
lows any user on the system to watch for mailbox2ics to
run and observe the password used. A more secure way
to provide the password is through a configuration file.
The file can have filesystem permissions set so that only
- Extending Python
- Working with IMAP and iCalendar
- Processing Web Forms Using Anonymous Functions & WSGI
- Creating custom PyGTK widgets with Cairo
For more info go to: http://www.pythonmagazine.com
If you're a web developer, you're well aware of the importance of forms in web development.
lects information from users – but only from registered framework is WSGI compliant, those func-
users. And let's further assume that you want to be able tions will be accepting the same arguments
to maintain internal state across HTTP requests (e.g., as before, and the webpage from step 2 can
so people don't have to keep on logging in to use the pass this information along and let the "Forms
site). To meet these fundamental needs, you will prob- Middleware" do it's thing.
ably need the following:
Moving forward, we will begin to build this form middle-
• something to map url's to internal functions ware we just spoke about in step 3 above, and we'll finish
• an authentication mechanism up by talking about using the information provided by
• something to do form processing the middleware with anonymous functions to build SQL
• and probably a lot more! statements that can be used in your web application.
that all of our data types comply to the standard inter- ron" variable that I've been referring to throughout this
face (e.g., if a subclass tried to ignore the "validates" article. But first, you can define the class as shown in
method, our users wouldn't be able to ensure a value is Listing 3.
valid). To see some standard data types that will be use- And that's basically it! The class boils down to:
ful going forward, see Listing 1.
These are pretty simple, but you get the idea. You • Create the form with a list of fields, which we
may even want to provide better error checking - for assume (although we do not check for it above
example, the Integer type will allow you to pass floats explicitly) are instances of our Field object.
without complaining, but you might not want that. It • Populate the form with the values provided in
all depends on what you'll be using the data for, but I'll the form.
leave it up to you to define more types and perhaps bet- • Validate the values given.
ter error checking.
Essentially, what these two examples do is provide a Step 2 probably requires a bit more discussion, so we'll
service: they make sure that values passed to the form do that in the next section. However, but for that one
comply with certain data standards. Also, note that their exception, the class should be pretty easy to follow. We
'validates' method will raise an exception if there is an ensure that we are WSGI compliant with our 'validates'
error in processing the form data. That is, the value is method by accepting both 'environ' and 'start_response'
untouched and we know the field validates or we get arguments and returning an iterable of strings (the er-
rely heavily on our 'Field' class as well as our 'DataType' discussed the framework, let's move on to discuss in a
class (although the latter is not self-evident from above, little more detail the environ variable we so cleverly used
we know it to be the case). Note that we use a trick to with Python's built in cgi module.
return our values to the user: we rely on the fact that
the caller of our function can test for errors simply by
determining if they received the empty dictionary. WSGI - The Environ Variable &
CGI
Above, we created a form class that we'll be using later
to process data received from a web user. Within that
"You may not think that class we used Python's cgi module to give our Form class
legs in terms of getting at the data the user sent to us
via the form. So what exactly is in that 'environ' vari-
being WSGI compliant able that is sent as part of every WSGI call? Well, as Ben
Bangert (http://groovie.org/) so aptly put it:
this one example?" request body can be read. (The server or gateway may per-
form reads on-demand as requested by the application, or
it may pre- read the client's request body and buffer it in-
memory or on disk, or use any other technique for provid-
ing such an input stream, according to its preference.)"
So, the interface is simple: we return a mapping of the (Taken directly from the WSGI PEP)
field names to any errors encountered from that field.
If no errors are encountered, an empty dict is returned. We also know from reading the cgi module's source that
Furthermore, each error encountered contains vital in- the FieldStorage class can be instantiated with a file
formation to the caller: which field contained an error pointer (fp, which defaults to sys.stdin) as well as an
(ie, the key of the dictionary), and a descriptive message environment (environ, which defaults to os.environ).
(provided by the underlying classes) telling the end user Since the environment we're given as part of the WSGI
what the problem with the field was.
You may not think that being WSGI compliant above LISTING 4
is terribly important, but what if you are writing a giant 1.
framework or website instead of just looking at this one 2. import re
3.
example? Knowing that each and every component you 4. class Email(Varchar):
5. email_pattern = re.compile('^([a-zA-Z0-9_.\-+])+@(([a-zA-Z0-9-])+.)+([a-zA-
deal with complies with the same interface, enabling you Z0-9]{2,4})+$')
to "just use the component for the purpose it serves", 6.
7. def validates(self, value):
is compelling. Also, since we've made the 2nd parameter 8. value = super(Email, self).validates(value)
9.
optional, anyone who knows about our interface can just 10. # Further error checking specific to emails
call the function with the first argument and leave the 11.
12. if self.email_pattern.match(value):
2nd blank. Those who would like to call it blindly with- 13. return value
14. else:
out knowing the interface specifically can call it with the 15. raise ValueError("Must be a valid email (eg, 'john_doe@myhost.com')")
default WSGI arguments and all is well. Now that we've 16.
protocol contains such a file pointer and an environment if there are errors, errors will evaluate truthfully (i.e.,
variable, we pass them explicitly to the FieldStorage call. things are *not* okay) and will map the problematic
The cgi module takes over from there, and graciously field names to their error messages. For example, if a
provides us with a dictionary-like object that contains user were to provide an email similar to the following:
all the values sent by the user via the form!
bad@hostcom
How To Use the Form Class and all other fields (first and last name in our example)
were fine, the resulting error dictionary would look like:
& Anonymous Functions to
Process Data
{'email': "Must be a valid email (eg, 'john_doe@
myhost.com')"}
We've now come to the point in this article where we You could then use this to regenerate the form letting
have a WSGI component that can process and validate them know that the email field contained a bad value
forms in a WSGI-compliant way. Note that if you wanted and they need to fix it. Helpful error messages can go a
to, you could just as easily use another WSGI component long way in making things go as smoothly as possible.
that acted as middleware to process form submissions - But we are still left to our own devices to generate the
again, that's the beauty of WSGI! But we'll use our own SQL used to take the data from the user and put it into
classes here because they are simple to use, easy to ex- the database. This is where we will begin to use anony-
mous functions (or "lambdas") to help us again. It might who operate in Pennsylvania but they don't care about
be overkill for the current example, but we'll move onto anything else). Your standard search SQL might look like
something more substantial once you've seen the tech- the following (adapted, yet slightly modified from my
nique in action. work on the portss.com website):
Lambda functions are Python's way of representing
anonymous functions (of single expressions, at least). sql = '''
Select Distinct
We also know that there is a standard protocol for enter- name, service, ... etc.
ing values into our database (at least, there should be From
contractors
a standard protocol for entering values into your data- Where
base). So one easy way to enter the information into the %s
'''
database is shown in Listing 6.
Now, that's quite a mouthful! Essentially, what we Pretty straightforward. But now you get into each of the
tried to do was link everything together so that the additional filters that need to be applied, depending on
only thing we would need to change if our table were to the query sent by the end user. So we might set up a
change is the 'sql_map' dictionary. In that dictionary, we filter map as follows:
stored a list of columns and where we would like them
to go in the insert statement. So, if we decided to later filter_map = {
add 'phone number' to our database table, all we would 'name' : lambda val : ('company LIKE %s',
Conclusion
In concluding, let's briefly recap what we've seen:
Extending Python
Using C to Make Python Smarter
by John Berninger
S
o, the stock distribution of Python isn't good enough ment after the base OS install does take some work, but
for you, hmm? Well, that's not too surprising - it I'll show you the commands I used to get there. Once
wasn't good enough for me, either. Naturally, I de- the default OS is installed, I added the "rpmdevtools"
cided to Do Something about it - I taught Python a few package with the command "yum install rpmdevtools",
new tricks by writing a new module specifically for what which I use for Fedora packaging. This package required
I wanted to do - and it did indeed make my life easier! the 'fakeroot' package be installed for dependencies, and
I'm going to create a new module that duplicates also required updates on the following packages (again
functionality already available in Python modules as an for dependencies):
example, so please forgive the seeming duplication of
effort. It's easier to make sense of things using "Hello, • elfutils
World!" examples. • elfutils-libelf
is done in the same way as This will ensure that this error/exception object is
to get down and dirty in the C code itself and figure out mediately above the function it's describing, but this is
what makes the module tick, and what to turn sideways just convention - the string defintion can be anywhere
to get it to tock, as well as tick. in relation to the function definition except inside the
So, are generators or translators useful? Absoluely. function itself.
Are they applicable to your needs? Probably, but they
might not always be - and it's those few times when
they're not applicable that you're going to desperately
For Our Next Trick, A Function
need something that will let you finish the project by That Does Work
noon tomorrow for a presentation to the Board of Direc-
So now that we understand how to define functions, let's
tors just before the big company-wide rollout announce-
write one that actually does something, then examine it
ment. So by all means, find the generators and transla-
in detail. See listing 1 for the fully functional code.
tors and what-nots. But please, first understand what
The first thing we see is the documentation string.
they're doing to you and for you.
This tells us what the function will be doing: determin-
ing if a given number is even or odd. Yes, this is a really
Making A Python-callable trivial funtion that's already available in Python - I did
this deliberately so I could teach concepts and not have
function to worry about teaching behavior.
ception), and a descriptor string that gets printed to extra Python information and makes the actual param-
standard out. We signal the error by returning NULL to eter available in it's "bare" C (or C++) form - a plain
our caller, which the interpreter handles as a exception. integer, or a plain null-terminated string. This function
Once we've successfully parsed the argument, it's time becomes especially important when we start passing Py-
to do the actual work. Determining if a number is even thon tuples, lists, and dictionaries to our C functions -
is a simple matter, so we perform the test and store the we need that function to help us tell the C DSO (dynamic
result that we'll want to hand back to the calling Python shared object) how to translate the list into an array,
interpreter in the returnValue variable. or the dictionary into a struct, or how to simply disas-
The last line of the function is also very Python-esque. semble the tuple into it's component strings, integers,
We want to return a Python object, not a simple integer, and floating point numbers.
so we have to create that object via the Py_BuildValue()
function. The first parameter is the format of the object,
again a single integer in this case, then we see a list of Aside Two: How Does Py_
sufficient variables to build the described object. This BuildValue Do That?
works much like a printf() or scanf() call - the number
of parameters after the object structure must be exactly The Py_BuildValue() function is a fairly complex beast -
equal to the number of items within the object struc- it has to convert C or C++ objects into Python objects.
ture. In the Python source code, this routine works by call-
ing PyInt_FromLong() versus Py_BuildValue(), but that Each entry in this table consists of four items. The
is merely a cosmetic difference as we've seen above. first is the name of the function as it wil be called inside
The important question here is why we used a helper the Python interpreter. The second is the name of the
function for a recursive call versus simply re-calling the C function as defined in the C source for the module.
getFactorial() function. The answer is remarkably sim- The third entry tells us how parameters will be passed
ple - re-calling getFactorial would involve creating new - the possible values are METH_VARARGS, METH_KEY-
Python objects from the interim results prior to each WORDS, METH_VARARGS | METH_KEYWORDS, and 0. You
recursive call, and would also involve all the additional should always use METH_VARARGS or METH_VARARGS |
computation and memory overhead of storing and pars- METH_KEYWORDS unless you really know what you're do-
ing those Python objects. Since we don't want to waste ing. The fourth parameter is simply a description of the
valuable computing resources, we simply made a helper function.
function that deal with the C objects and variables na-
tively.
Aside: Varargs? Keywords?
One Plus One Equals... Whazzat?
Now we have to tell the main Python interpreter what When determining how to pass parameters to your mod-
our module can do. We do this by creating a method ule functions, you will most often use the METH_VA-
ule by setting the ErrorObject to be the value of the item directory for the distribution you're using (for Fedora 7,
'error' in the module's dictionary. Since that might be this would be /usr/lib/python2.5/site-packages/), and
hard to follow (I know it was hard for me to write out), start using the module. It's just that simple!
I'll try to explain by using code-like variable representa- Of course, we first have to know how to compile the
tions. Initially, we can imagine the module dictionary .so. In it's most basic form, this is two commands - the
as being in the following form: first one compiles the source to an object (.o) file using
GCC. For our example, we would do the following:
testmodule: {
'name' => 'testmodule';
$ gcc -I /usr/include/python2.5 -c listing4.c
'size' => '4 functions';
'author' => 'jwb';
} This causes the listing4.c program to be compiled to ob-
ject format in the listing4.o file. The -I tells the com-
piler to search in /usr/include/python2.5 for included
header files, which we need in order to find the Python.h
"You should always name file and include it's definitions. The second command
turns it into a shared object suitable for dynamic load-
ing via a dlopen() call. Again with our example, we do
your routines something the following:
descriptive - this is just as The -shared option tells the linker to create a shared li-
brary as opposed to an executable, the -lpython2.5 tells
true in C as in Python." the linker to also link in the libpython2.5.so shared li-
brary, and the -o tells the linker what filename to write
- the default is a.out, which is usually not ideal.
Once you have that .so, that's what you drop into
The actual dictionary wouldn't look anything like that, the /usr/lib/python2.5/site-packages directory. Using
but that will serve the purposes of this illustration. Af- autotools, or even just an RPM spec file, will involve
ter we returned from the PyDict_SetItemString() call, a slightly more complex compilation process, but ulti-
our dictionary would look like this: mately the added complexity is just window dressing to
what really needs to happen.
testmodule: {
'name' => 'testmodule';
'size' => '4 functions';
'author' => 'jwb';
'error' => ErrorObject;
}
Mix thoroughly, bake at 350, John Berninger is a senior linux systems administrator at
allow to cool, and serve Gilbarco Veeder-Root in Greensboro, NC. He's been doing linux
and unix for far too long to want to be reminded of that number
Now we just have to put all the pieces together into a of years, including serving hard time as a Red Hat Consultant
single file such as in Listing 4. Once this is done, we can on Wall Street. He enjoys getting away from computers via
compile the module into a .so, drop it into the proper photography and SCUBA diving.
Welcome to Python
Elegant XML parsing using the
ElementTree Module
by Mark Mruss
A
lmost everyone needs to parse XML these days.
They're either saving their own information in XML REQUIREMENTS
or loading someone else's data. This is why I was
glad to learn that as of Python 2.5, the ElementTree XML
PYTHON: 2.2+
package has been added to the standard library in the
XML module.
What I like about the ElementTree module is that Other Software: Python 2.5 or ElementTree Module
it just seems to make sense. This might seem like a
strange thing to say about an XML module, but I've had
to parse enough XML in my time to know that if an XML Useful/Related Links: •
module makes sense the first time you use it, it's prob- • http://effbot.org/zone/element-index.htm
ably a keeper. The ElementTree module allows me to work
• http://effbot.org/zone/element-index.htm#installation
with XML data in a way that is similar to how I think
about XML data. • http://docs.python.org/dev/whatsnew/whatsnew25.
A subset of the full ElementTree module is available html
in the Python 2.5 standard library as xml.etree, but you • http://effbot.org/zone/pythondoc-elementtree-Ele-
don't have to use Python 2.5 in order to use the El- mentTree.htm#elementtree.ElementTree.XML-function
ementTree module. If you are still using an older version
• http://effbot.org/zone/pythondoc-elementtree-Ele-
of Python (1.5.2 or later) you can simply download the
mentTree.htm#elementtree.ElementTree.parse-function
module from its website and manually install it on your
system. The website also has very easy to follow instal- • http://docs.python.org/lib/module-xml.etree.Element-
lation instructions, which you should consult to avoid Tree.html
issues while installing ElementTree.
In general, the ElementTree module treats XML data
as a list of lists. All XML has a root element with zero
Let's alter the XML that we are working with to add at-
The second approach to creating a child element is to
tributes to the elements and look at how we would parse
create an Element object separately (rather than a sub
that information.
element) and append it to a parent Element object. The
If the XML uses attributes in addition to (or instead
results are exactly the same - this is simply a different
of) inner text, they can be accessed using the Element
approach that may come in handy when creating your
object's attrib attribute. The attrib attribute is a Python
XML, or working with two sets of XML data.
dictionary and is relatively easy to use:
First we create an Element object in the same way that
This creates the child Element object and sets its text to
When you run the code you get the following output:
"Two". We then append it to the root element:
{'val': 'One'}
{'val': 'Two'} #now append
root_element.append(child)
These are the attributes for each child element stored in
Pretty simple! Now, if we want to look at the contents
a dictionary. Being able to work with an XML element's
attributes as a Python dictionary is a great feature and of our root_element (or any other Element object for that
fits well with the dynamic nature of XML attributes. matter) we can use the handy tostring function. It does
exactly what its name suggests: it converts an Element
object into a human readable string.
Writing XML
#Let's see the results
Now that we've tried our hand at reading XML, let's try print ET.tostring(root_element)
creating some. If you understand the reading process,
you should have no trouble understanding the creation To recap, have a look at the code in Listing 3. When you
process because it works in much the same manner. What run this code you will get the following output:
we are going to do in this example is recreate the XML
<root><child>One</child><child>Two</child></root>
data that we were working with above.
The first step is to create our <root> element:
Reading XML files However, since we only want the directory name (not
the full path and filename of our Python source file) we
Many times you won't be working with XML data that you have to strip off the filename:
explicitly create in your code. Instead, you will usually
read the XML data in from a data source, work with it, xml_file = os.path.dirname(xml_file)
and then save it back out when you are done. Fortu-
nately, configuring ElementTree to work with different Now that we have the directory in which the our.xml file
data sources is very easy. For example, let's take the resides, all we have to do is append the our.xml filename
XML data that we first used and save it to a file named to the xml_file variable. However, instead of just doing
our.xml in the same location as our Python file. something like:
There are a few methods that we can use to load XML
xml_file += "/our.xml"
data from a file. We are going to use the parse func-
tion. This function is nice because it will accept, as a we will use the os module to join the two paths so that
parameter, the path to a file or a "file-like" object. The the resulting path is always correct regardless of what
term "file-like" is used on purpose because the object operating system our code is executed on:
does not have to be a file object per se - it simply has
to be an object that behaves in a file-like manner. A xml_file = os.path.join(xml_file, "our.xml")
"file-like" object is an object that implements a "file-
Note: If you have any trouble understanding what any of
<child>Three</child>
Notice that in order to add a child to the root element </root>
we used the ElementTree object's getroot function. The
getroot function simply returns the root Element object
of the XML data. Reading from the Web
Now that we have a third child element, let's write the
Working with a local file is very useful, but you might
XML data back out to our.xml. Thanks to ElementTree this
also be in a situation where you will have to work with
is a painless experience:
an XML file that is located on the Internet, perhaps an
tree.write(xml_file) RSS feed. Fortunately, since the parse function ex-
plained above works with file-like elements, loading a
That's it! URL is very easy.
If we want to be really careful when writing the XML First off, you need to import the urllib module. It's
data out to a file, we'll watch out for exceptions. Howev- a standard module that allows you to open URLs in a
er most of the time the write method will succeed with- method similar to opening local files:
out throwing an exception; it is more important to be import urllib
<root>
<child>One</child>
<child>Two</child>
LISTING 4
!
1. #!/usr/bin/env python
FOOTNOTES
2. [1] http://docs.python.org/whatsnew/modules.html#SE
3. from xml.etree import ElementTree as ET
4. import os
CTION0001420000000000000000
5. [2] http://effbot.org/zone/pythondoc-elementtree-
6. def main():
7.
ElementTree.htm#elementtree.ElementTree.XML-function
8. xml_file = os.path.abspath(__file__) [3] http://effbot.org/zone/pythondoc-elementtree-
9. xml_file = os.path.dirname(xml_file)
10. xml_file = os.path.join(xml_file, "our.xml")
ElementTree.htm#elementtree.ElementTree.ElementTree-
11. class
12. try:
13. tree = ET.parse(xml_file)
14. except Exception, inst:
15. print "Unexpected error opening %s: %s" % (xml_file, inst)
16. return
17.
18. child = ET.SubElement(tree.getroot(), "child") For the last seven years
19. child.text = "Three" Mark Mruss has worked as a software
20.
21. try: developer, programming in the much
22. tree.write(xml_file) maligned C++. In 2005 Mark decided
23. except Exception, inst:
24. print "Unexpected error writing to file %s: %s" % (xml_file, inst) it was time to add another language
25. return to his arsenal. After reading Eric
26.
27. if __name__ == "__main__": Raymond's well known article "Why Python?" he set his
28. # Someone is launching this directly sights on the inviting world of Python.
29. main()
Random Hits
The Python Community
I
I suppose this focus on community has to an extent
've always been fairly community- structured my career, such as it has been. If anyone can
minded. In the 1970s I was Treasurer claim to have started PyCON I suppose it's me, and the
primary impetus behind the action was my attendance
of DECUS in the UK, and in the 1980s I at my first International Python Conference. This was
was Chairman of the Sun UK User Group. a typical commercial affair costing around six hundred
dollars (plus travel and hotel for those who weren't local
I accepted those positions because of to the event), and my initial response to it was "I bet
a belief in the value of communities there are a lot of people who would like to go to Python
bound by a common interest in solving conferences but can't afford this".
So I became more involved with the affairs of the Py-
problems using specific technologies. thon Software Foundation and then Guido van Rossum
This might seem a bit dangerous - the (the inventor of Python) asked me to chair the first Py-
CON in 2003. We could have gone in for extensive plan-
old saying that if the only tool you ning sessions, but we might still be planning the first
have is a hammer then all problems PyCon had we done that - no "big design up front" for
look like nails is very true, but the the agile community! As Win Borden wrote, "If you wait
to do everything until you're sure it's right, you'll prob-
technologies I have been interested in ably never do much of anything."
all my professional life are much more That first PyCon had an atmosphere I shall never for-
get. It was almost as though the convicts had taken
flexible than hammers. Which can be over the prison, and people were alight with the tangible
a good thing or a bad thing: there are sense of new possibilties. This was inevitably followed
many different types of nail too. by PyCON DC 2004 and 2005, and now we've had PyCon
TX 2006 and 2007, with a change of venue as we were further demonstration of the effectiveness of the open
victims of our own success: we attracted around 250 source approach, and PyCon planning has always been a
people in the first year, and by the third year had clearly fairly open process.
outgrown our original home at George Washington Uni- How is PyCon "better" than the old International Py-
veristy. The delegate count was almost 600 in 2007, so thon Conference? Well, for a start, it is way more afford-
by most reasonable standards I guess the idea can be able. Although I have at times worked in the proprietary
considered a success. systems world I have never felt entirely comfortable