Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Look up keyword
Like this
2Activity
0 of .
Results for:
No results containing your search query
P. 1
Python and Generators

Python and Generators

Ratings:

4.0

(1)
|Views: 773 |Likes:
Published by motek
This article discusses the concept of generators, partly as a convenient bridge between imperative and functional programming. The example also continues the exploration of Python as a dynamic language - this time in a very literal way. Module inspect is used to extract useful information from the source code of a running program.
This article discusses the concept of generators, partly as a convenient bridge between imperative and functional programming. The example also continues the exploration of Python as a dynamic language - this time in a very literal way. Module inspect is used to extract useful information from the source code of a running program.

More info:

Published by: motek on Apr 09, 2008
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

09/27/2012

pdf

text

original

 
Generators, comments and the data fetish
Author:Marcin Swiatek , Visimatik Inc., 2008I worked on a tool for navigating data hierarchies stored in HDF5 files. As it usually happens, at some point I wanted to convince myself that the program would be adequate for the intended purpose
1
. Tothis end, I needed to exercise it against a reasonably wide set of exemplary data. While I already hadexamples downloaded from the HDF5 web siteand the datasets I had generated for my project, I needed more still. To this end, I have contrived a tool to help me quickly populate tables in an HDF5file of arbitrary structure with random data
2
. This is where things get awkward. Given the sheer thrill of writing test code and the comfortingobviousness of several solutions to the problem, how do I convince at least one person to stay with me past this paragraph? I promise to spend little time on trivial matters, like generating random numbersor the syntax of loop statements. The article will focus on generators and reflection and suggest a practical and entertaining use of these fairly obscure, yet very useful, aspects of Python.This document and the associated source code can be downloaded fromwww.visimatik.com. 
The set-up
Frequently the problem statement suggests possible solutions. And while prudent programmers shouldtreat solutions that 'invite themselves' with distrust, in this case the description is best taken at its facevalue. The tandem routines
 PopulateTablesInFile
and
 PopulateTable
(Listing 1) implement astraightforward strategy: find all tables in the file and against each table execute the prescribednumber of insertions. Insertion consists of an assignment (each field gets assigned a new value),followed by a call to the
append 
method on an object representing the table's row.The only aspect that calls for comment is how assigned values are produced. This task is delegated tocallable objects I will refer to (somehow awkwardly) as 'data makers
3
'. It is rather obvious what a'maker' needs to do: it will, when called, return a new value of the expected type. Values for allcolumn types, excepting strings, are ultimately generated by calls to functions imported form thestandard library module
random.
 It seems reasonable to associate these 'makers' with data types, or, more exactly, with types of types:kinds
4
. However, I decided to have the front-line 'maker holder' indexed by column descriptions,rather then just types. Column descriptors carry information about column type and for the purpose of this exercise can be reduced to
kinds
. In the future, however, they will let me introduce another levelof indirection between the
dataMaker 
parameter of 
 PopulateTable
and the
 KindMakers
dictionary. I plan to use it to populate columns with variates of different distributions, depending on datasemantics.
 PopulateTablesInFile
constitutes the highest-level interface to the 'table stuffer' module
1This process is usually called 'testing'.2There is a very long argument to be made about using synthetic data for verification of algorithms and data analysismethods. In several fields I am familiar with, synthetic data is used often and to good results. However, in this text Iintend to focus on some features of the programming language: Python. Building models for generators of syntheticdata is out of scope of this article.3I am trying hard to avoid calling these objects 'generators'. The name 'generators' might seem natural to describe aroutine that generates something. But since this article is about generators in a different meaning, it is better keptthis way. 'Makers' for objects serving data when called, 'generators' for...4For practical purposes it the details (like representation length) of columns' types won't really matter. Makers returnvalues in Python types, which are converted upon assignment. To keep things simple I have decided to ignore thedistinction between fixed-length string columns in pytables and Python strings – the only practical situation whereexact type is indeed important.
 
(
TableStuffer.py
) and is used by the actual command script
(TestMaker.py).
There is nothingremarkable about the command line script and we will spend no time analyzing it.Structures of HDF files the script can create have been predefined using the mechanism discussed previously. These definitions can be found in
Schemas/Canned.py
. You will need pytablesandHDF5 libraries to run the example.
First, there was Word – or on the benefits of wordy comments
When it comes to populating text fields, the simplest solution is to generate random sequences of characters. However, this approach has considerable drawbacks. While the practice may be adequatefor load testing, it will not do if data is to be ever evaluated by a human being. Chance doesn't look life-like and a person evaluating test results based on random symbols will have considerabledifficulties finding and recalling any point of reference. The usual way out of this difficulty is to takewords from a file containing natural language text (literature classics always work best). However, this is a truly light-weight project and hauling a large text file around with a tiny test script just seemsto be out of proportion.How could I do without, then? The program itself is text, although admittedly with a limited and peculiar vocabulary. Yet a good part of any source file is written for human readers: comments anddoc-strings. This will be my source of test data.Given Python's nature, it is relatively easy to access this information in the runtime. The standardlibrary moduleinspectoffers several useful tools to get the task done. In
wordManufacture
routine(Listing 2), I traverse the live graph of runtime artifacts. It is worth underscoring that this is not thegraph of relations of program's data (objects), which may refer to each other in very complex ways.Here, we will remain on the
meta
level, where relations between entities (such as
module, type, class
or 
method 
) are defined by the lexical structure of the program
5
. One could expect a graph with edgesdefined by relations of inheritance and containment to be free of cycles. Unfortunately, this is not thecase:
>>> mmth = inspect.getmembers(__main__)>>> mnm = [t[1] for t in mmth if t[0] == '__main__']>>> mnm[<module '__main__' (built-in)>]
>>> mnm[0] == __main__ True
>>>
The practical consequence of this observation is that the code cannot be treated as a tree. Normally,one would strive to devise an algorithm avoiding infinite loops. However, in this particular situation itmade more sense for me to embrace infinity. After all, the program is supposed to generate wordsuntil the end of time.The source code associated with a programming artifact may be inspected using appropriate routinesfrom the
inspect 
module. In particular, functions
 getcomments
and
 getdoc
extract comments anddocumentation strings, information that will suit best the purpose. Each obtained string will likelycontain several words: the smallest pieces of text that can be easily noticed, memorized andreferenced. The object producing text data will thus return individual words. Notice that the algorithmgathering words will have to operate on an nested structure, a graph of objects containing lists of words. A nested iteration is easy to program, but there is an additional challenge: the words need to bereturned one-by-one, in subsequent calls.One could gather all text up front and return words from a storage. This, however, requires the graphtraversal problem to be addressed properly. The alternative is to encapsulate the process of data
5This is a major simplification. In reality, Python's dynamic character makes the lexical structure of the programmore malleable then one may expect.
 
extraction in a class, which would progress iteration 'on demand', when more data is needed. There isnothing unusual about this proposition. For instance, classes reducing a complex data structure or analgorithm to an iteration are the favorite vehicle of database access libraries.Devising a class for the task would not difficult. The only aspect that might call for special care is thequestion of representing the state of nested iterations in object's variables.Interestingly, in Python the task of finding the suitable representation can be delegated to the languageitself.
The state of a computation
Suppose you invoke
wordManufacture()
as presented in Listing 2
.
What the call will return? Well, itis easy if you try...
>>>>>> p = wordManufacture()>>> p<generator object at 0x660d0>>>> p.next()'__main__'>>>
Instead of returning a string, as one might have expected, wordManufacture() returns an object – agenerator. According to Python documentation, it is enough to place a yield statement in the function's body to make the interpreter create a wholly different code execution structure and in placeof a normal function produce a
 generator function.
I find it convenient to look at generators in asimilar way as atiterators
6
. One could say aniterator represents an iteration. By the sametoken, generator can be thought of as an objectrepresenting, and permitting some control over aflow of a computation. In this context ‘control’amounts to something very much akin steppingthrough an iteration. However, the routine,which is to be controlled through a generator,needs to be written in a specific way, withexplicit definition of junction points, where thegenerator function will communicate with theoutside. In Python, these junction points aredefined using the
 yield 
keyword.Upon invocation of the method
next()
of agenerator object, the related generator routinewill execute up until the next
 yield 
in its code. Acall to
next()
returns whatever the generator routine yields.In earlier versions of Python ( pre PEP-342), theinterface of a generator was exactly that of aniterator and the construct lacked the two-waycommunication, enabled by the
 send 
method of the interface. Thus, generator functions were just a way to code some iterations more conveniently.Most examples given in literature reinforce this association.
6When discussing the concept, I much prefer to focus on the design pattern, rather than on its interpretation in aspecific language. Oddly enough, I could not find on-line an explanation that I liked. Wikipedia does have anarticle, but it is poorly written, in my opinion. This oneseems good, but the link just looks like it is not going to last. The seminal GoF book 'Design Patterns'brings a good discussion, but it is not available as an on-line reference.
 
Generators in Python are now something morethan just enhanced iterators. In the scenariodescribed here, one-way communication impliesthat one routine ‘plays-back’ another, as if having it perform a certain task. However, it is possible to communicate in both directions,thorough the
 send 
method of the
 generator 
interface and the return value of the
 yield 
instruction. This enables compositions, wheretwo or more routines collaborate on some task (the term for that is, I believe,
collaborativemultitasking 
). In other words, generator functions can be
coroutines,
with all associated benefits. For instance, it is a natural way of expressing several interesting algorithms.Programming coroutines is an interesting topicon its own right, but a fairly broad one, too. Itwill not be discussed here; instead refer allreaders interested in writing coroutines in Pythonto the already invoked PEP-342 and other resources on the web.

Activity (2)

You've already reviewed this. Edit your review.
1 thousand reads
1 hundred reads

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->