You are on page 1of 84

Optical Character Recognition

A Major Qualifying Project Report submitted to the faculty of


GURU NANAK DEV INSTITUTE OF TECHNOLOGY
in partial fulfillment of the requirements
for the Diploma in Computer Sciences by

________________________
Kushagra Chadha
________________________
Amit Kumar

April 20, 2016

___________________________________
Professor Muneesh Meena, Major Advisor
___________________________________
Abstract

Our project aimed to understand, develop and improve the open Optical Character

Recognizer (OCR) software, OCR, to better handle some of the more complex recognition issues

such as unique language alphabets and special characters such as mathematical symbols. We

developed OCR to work with any language by creating support for UTF-8 character encoding.

The various stages of an OCR system are: upload a scanned image from the computer,

segmentation process in which we extract the text zone from the image, recognition of the text

and the last which is post processing process in which the output of the previous stage goes

through the error detection and correction phase. This report explains about the user interface

provided with the OCR with the help of which a user can very easily add or modify the

segmentation done by the OCR system.


Table of Contents
Chapter 1: Background....................................................................................................................5
1.1 Introduction................................................................................................................................5
1.2 History of OCR..........................................................................................................................5
1.2.1 Template-Matching Method....................................................................................................6
1.2.2 Peephole Method....................................................................................................................9
1.2.3 Structured Analysis Method..................................................................................................10
1.2.4 Factors influencing OCR software performance..................................................................12
1.3 Independent Component Analysis...........................................................................................15
1.4 Energy-based Models for sparse overcomplete representations..............................................22
1.5 Finite State Transducers in Language and Speech Processing................................................24
1.5.1 Sequential Transducers.........................................................................................................25
1.5.2 Weighted Finite State Transducers........................................................................................25
1.5.3 Transducers in Language Modeling......................................................................................27
1.6 Image File Formats..................................................................................................................28
1.6.1 TIFF......................................................................................................................................28
1.6.2 PDF.......................................................................................................................................28
1.6.3 PNG.......................................................................................................................................29
1.6.4 JPEG.....................................................................................................................................29
Chapter 2: SIP and PyQT...............................................................................................................30
Introduction¶..................................................................................................................................30
2.1 License.....................................................................................................................................30
2.2 Features....................................................................................................................................30
2.3 SIP Components.......................................................................................................................31
2.4 Preparing for SIP v5.................................................................................................................32
2.5 Qt Support................................................................................................................................32
2.6 Installation................................................................................................................................32
2.6.1 Downloading.........................................................................................................................32
2.6.2 Configuring...........................................................................................................................32
2.6.3 Building.................................................................................................................................36
2.6.4 Configuring with Configuration Files...................................................................................36
2.7 Using SIP.................................................................................................................................37
2.7.1 A Simple C++ Example........................................................................................................37
2.7.2 A More Complex C++ Example...........................................................................................39
2.7.3 Ownership of Objects...........................................................................................................43
2.7.4 Types and Meta-types...........................................................................................................44
2.7.5 Lazy Type Attributes.............................................................................................................44
2.8 Support for Python’s Buffer Interface......................................................................................45
2.9 Support for Wide Characters....................................................................................................45
2.10 The Python Global Interpreter Lock......................................................................................45
2.11 Building a Private Copy of the sip Module............................................................................45
The SIP Command Line.................................................................................................................46
2.12 SIP Specification Files...........................................................................................................48

2.13 Variable Numbers of Arguments............................................................................................49


2.14 Additional SIP Types..............................................................................................................49
2.15 Python API for Applications..................................................................................................51
Chapter 3: pyTesser........................................................................................................................56
3.1 Introduction:.............................................................................................................................56
3.2 Dependencies:..........................................................................................................................56
3.3 Installation:..............................................................................................................................56
3.4 Usage:......................................................................................................................................56
3.4 File Dependencies:...................................................................................................................57
3.5 Python Image Libraryy............................................................................................................57
3.5.1 Introduction ..........................................................................................................................57
3.5.2 Image Archives ....................................................................................................................57
3.5.3 Image Display ......................................................................................................................57
3.5.4 Image Processing .................................................................................................................57
3.5.5 Using the Image Class .........................................................................................................58
Chapter 4: Core Program Source Code..........................................................................................64
//Importing key modules................................................................................................................64
//UI Implementation.......................................................................................................................64
//File Picker Implementation..........................................................................................................65
//OCR Conversions........................................................................................................................66
//Main implementation...................................................................................................................66
//Calling Main................................................................................................................................66
Chapter 5: Live Example...............................................................................................................67
Conclusions....................................................................................................................................67
6.1 Results......................................................................................................................................67
6.2 Conclusions on pyOCR............................................................................................................68
6.3 Future Work.............................................................................................................................68
References......................................................................................................................................70
Chapter 1: Background

1.1 Introduction

We are moving forward to a more digitized world. Computer and PDA screens are replacing the

traditional books and newspapers. Also the large amount of paper archives which requires

maintenance as paper decays over time lead to the idea of digitizing them instead of simply

scanning them. This requires recognition software that is capable in an ideal version of reading

as well as humans. Such OCR software is also needed for reading bank checks and postal

addresses. Automating these two tasks can save many hours of human work.

These two major trends lead OCR software to be developed and licensed to OCR

contractors. “There is one notable exception to this, which is pyOCR open source OCR software

that we have developed” .

pyOCR was created by us on April 10, 2016 with the goal of providing an open source

OCR system capable of performing multiple digitization functions. The application of this

software ranged from general desktop use and simple document conversion to historical

document analysis and reading aids for visually impaired users.

1.2 History of OCR

The idea of OCR technology has been around for a long time and even predates electronic

computers.
Figure 1: Statistical Machine Design by Paul W. Handel

This is an image of the original OCR design proposed by Paul W. Handel in 1931. He applied for

a patent for a device “in which successive comparisons are made between a character and a

character image.” . A photo-electric apparatus would be used to respond to a coincidence of a

character and an image. This means you would shine a light through a filter and, if the light

matches up with the correct character of the filter, enough light will come back through the filter

and trigger some acceptance mechanism for the corresponding character. This was the first

documented vision of this type of technology. The world has come a long way since this

prototype.
1.2.1 Template-Matching Method
In 1956, Kelner and Glauberman used magnetic shift registers to project two-dimensional

information. The reason for this is to reduce the complexity and make it easier to interpret the

information. A printed input character on paper is scanned by a photodetector through a slit. The

reflected light on the input paper allows the photodetector to segment the character by

calculating the proportion of the black portion within the slit. This proportion value is sent to a

register which converts the analog values to digital values. These samples would then be

matched to a template by taking the total sum of the differences between each sampled value and

the corresponding template value. While this machine was not commercialized, it gives us

important insight into the dimensionality of characters. In essence, characters are two-

dimensional, and if we want to reduce the dimension to one, we must change the shape of the

character for the machine to recognize it.

Figure 2: Illustration of 2-D reduction to 1-D by a slit. (a) An input numeral “4” and a slit
scanned from left to right. (b) Black area projected onto axis, the scanning direction of the slit.
1.2.2 Peephole Method

This is the simplest logical template matching method. Pixels from different zones of the

binarized character are matched to template characters. An example would be in the letter A,

where a pixel would be selected from the white hole in the center, the black section of the stem,

and then some others outside of the letter.

Figure 3: Illustration of the peephole method.

Each template character would have its own mapping of these zones that could be matched with

the character that needs to be recognized. The peephole method was first executed with a

program called Electronic Reading Automation in 1957.


Figure 4: The Solartron Electronic Reading Automaton

This was produced by Solartron Electronics Groups Ltd. and was used on numbers printed from

a cash register. It could read 120 characters per second, which was quite fast for its time, and

used 100 peepholes to distinguish characters.

1.2.3 Structured Analysis Method

It is very difficult to create a template for handwritten characters. The variations would be too

large to have an accurate or functional template. This is where the structure analysis method

came into play. This method analyzes the character as a structure that can be broken down into

parts. The features of these parts and the relationship between them are then observed to

determine the correct character. The issue with this method is how to choose these features and

relationships to properly identify all of the different possible characters.

If the peephole method is extended to the structured analysis method, peepholes can be

viewed on a larger scale. Instead of single pixels, we can now look at a slit or ‘stroke’ of pixels

and determine their relationship with other slits.


Figure 5: Extension of the peephole method to structure analysis.

This technique was first proposed in 1954 with William S. Rohland’s “Character Sensing

System” patent using a single vertical scan. The features of the slits are the number of black

regions present in each slit. This is called the cross counting technique.

1.2.4 Factors influencing OCR software performance

OCR results are mainly attributed to the OCR recognizer software, but there are other factors that

can have a considerable inpact on the results. The simplest of these factors can be the scanning

technique and parameters.

The table below summarizes these factors and provides recommendations for

OCR scanning on historic newspapers and other old documents.


8
Process Steps Factors influencing Recommended actions for historic
OCR newspapers
 Use original hard copies if budget
allows (digitization costs will be
considerably higher than for using
microfilm)
 Hard copies used for
microfilming/digitization should
be the most complete and cleanest
Quality of original
Obtain original source version possible
source
 Use microfilm created after
establishment and use of microfilm
imaging standards (1990’s or later)
 Use master negative microfilm
only (first generation) or original
copies, no second generation
copies.
 Scanning resolution should be 300
dpi or above to capture as much
Scanning resolution image information as possible
Scan file and file format  File format to be lossless e.g. TIFF
so that no image information
(pixels) are lost.
 Scan the image as grayscale or bi-
tonal.
 Bit depth of
 Image optimization for OCR to
image
increase contrast and density needs
 Image
to be carried out prior to OCR
Create good contrast optimization
either in the scanning software or a
between black and white and
customized program.
in the file (Image binarization
 If the images are grayscale,
preprocessing) process
convert them to image optimized
 Quality of
bi-tonal (binarization).
source (density  Obtain best source quality.
of microfilm)
 Check density of microfilm before
scanning.
 Skewed pages
 Pages with
 De-skew pages in the image
complex
preprocessing step so that word
OCR software - Layout of layouts
lines are horizontal.
page analyzed and broken  Adequate
 Layout of pages and white space
down white space
cannot be changed, work with
between lines,
what you have.
columns and at
edge of page

so that text
boundaries can
be identified
 Optimize image for OCR so that
character edges are smoothed,
rounded, sharpened, contrast
 Image
OCR software - Analyzing increased prior to OCR.
optimization
stroke edge of each  Obtain best source possible
 Quality of
character (marked, mouldy, faded source,
source
characters not in sharp focus or
skewed on page negatively affects
identification of characters).
 Pattern image
OCR software - Matching
in OCR
character edges to pattern
software
images and making Select good OCR software.
database
decision on what the
 Algorithms in
character is
OCR software
OCR software – Matching  Algorithms
whole words to dictionary and built in
Select good OCR software.
and making decisions on dictionaries in
confidence OCR software
 Depends on  Purchase OCR software that has
how much time this ability.
Train OCR engine you have  At present it is questionable if
available to training is viable for large scale
train OCR historic newspaper projects
Table 1: Potential methods of improving OCR accuracy.

1.3 Independent Component Analysis

This is a method that was developed with the goal of finding a linear representation of

nongaussian data so that the components are statistically independent. Data is nongaussian if it

does not follow a normal distribution. The cocktail party problem is a great example of the

need for a way to analyze mixed data. In this problem, there are two signal sources, two

people speaking at the same time, and two sources, microphones, to collect this data. We

would like to be able to take the mixed data of the two speakers collected from these two

microphones and somehow separate the data back to their original signals. Each microphone

will have a different representation of the mixed signal because they will be located in

different positions in the room. If we represent these mixed recorded signals as and we could

express this as a linear equation:

where are parameters that depend on the distances of the microphones from the

speakers . This gives us the nongaussian data we need to properly analyze these signals in an

effort to realize the original signals.


Figure 6: The original signals.
Figure 7: The observed mixture of the source signals in Fig. 6.

In order to properly execute Independent Component Analysis the data must go through

some initial standardization along with one fundamental condition: nongaussianity. To show why

Gaussian variables make ICA impossible, we assume we have an orthogonal mixing matrix and

our sources are all gaussian. Then and are gaussian, uncorrelated, and of unit variance. The

expression for their joint density will be:

()

The distribution for this equation is shown in the following figure.


Figure 8: The multivariate distribution of two independent gaussian variables.

The density of this distribution is completely symmetric and does not contain any

relevant information about directions of the columns of the mixing matrix. Because there is no

relevant information, we have no way to make estimates about this data . We thus need a

measure of nongaussianity, this can be done using kurtosis or negentropy.

Kurtosis is the older method of measuring nongaussianity and can be defined for as:

{ } { }

This simplifies to { } because is of unit variance and can be interpreted as the

normalized fourth moment { }. Kurtosis is usually either positive or negative for nongaussian

random variables. If kurtosis is zero, then the random variable is Gaussian. For this reason we

generally take the absolute value or the square of kurtosis as a measure of gaussianity.

The use of kurtosis has been commonly used in ICA because of its simple formulation and its

low computational cost. The computation cost is in fact reduced when using the fourth moment
of the data as estimation for its kurtosis. This is due to the following linear properties:
Although kurtosis proved to be very handy for multiple applications, it did have one major

weakness; its sensitivity to outliers. This means that when using a sample data in which the

distribution is either random or has some errors, kurtosis can fail at determining its gaussianity.

This lead to the development of another method called negentropy.

As the name suggests negentropy is based on entropy measure which is a fundamental

concept of information theory. Entropy describes the amount of information that can be taken out

of the observation of a given variable. A large entropy value means the data is random and

unpredictable.

For a discrete random variable Y, its entropy is expressed as follow:


In a similar manner the entropy of a continuous random variable y can be expressed as:

Information theory established that out of all random variables of equal variance, the Gaussian

variable will have the highest entropy value which can also be attributed to the fact that Gaussian

distribution is the most random distribution.

The precedent result shows that we can obtain a measure of gaussianity through

differential entropy which is called negentropy.

For a variable y we define its negentropy as:


( )

where a Gaussian random variable that has the same covariance matrix as the variable y.
Negentropy is zero if and only if y has a Gaussian distribution, thus the higher its measure the

less Gaussian the variable is. Unlike kurtosis, negentropy is computationally expensive. A

solution to this problem is to find simpler approximations of its measure. The classical

approximation of negentropy was developed by in 1987 by Jones and Sibson as follows:


{ }

with the assumption that y has zero mean and unit variance.

A more robust approximation developed by Hyvärinen makes use of nonquadratic functions as

follows:
∑ [{ } { }]

where some positive constans, v the normalized Gaussian variable and some non quadratic

functions.

A common use of this approximation is to take only one quadratic function G, usually
()

and the approximation will then be in the form:


[{ } { }]

We then have obtained approximations that provide computational simplicity comparable to the

kurtosis measure along with the robustness of negentropy.

To give a brief explanation on why gaussianity is strictly not allowed we can say that it

makes the data completely symmetric and thus the mixing matrix will not provide any

information on the direction of its columns.

As mentioned above, data preprocessing is crucial in that it makes the ICA estimation simpler

and better conditioned. Many preprocessing techniques can then be applied such as “Centering”

that consists in subtracting the mean vector of x


[]

so as to make x a zero-mean variable and “Whitening” which is the linear transformation of the

observed vector x so that its components become uncorrelated and its variances equal unity, this
vector is then said to be white.

1.4 Energy-based Models for sparse overcomplete


representations
Initially there were two approaches to Linear Components Analysis: The Density Modeling

Approach and the Filtering approach. Density Modeling is based on causal generative models

whereas the Filtering approach uses information maximization techniques. Energy based models

emerged as a unification of these methods because it used Density Modeling techniques along

with filtering techniques .

Figure 9: Approach diagram of Linear Component Analysis


Energy based models associate an energy to configuration of relevant variables in graphical

models, this is a powerful tool as it eliminates the need for proper normalization of the

probability distributions. “The parameters of an energy-based model specify a deterministic

mapping from an observation vector to a feature vector and the feature vector determines a

global energy, ” . Note that the probability density function of x is expressed as:

where Z is a normalization vector.

1.5 Finite State Transducers in Language and Speech


Processing

Finite State Machines are used in many areas of computational linguistics because of their

convenience and efficiency. They do a great job at describing the important local phenomena

encountered in empirical language study. They tend to give a good compact representation of

lexical rules, idioms, and clichés within a specific language.

For computational linguistics, we are mainly concerned with time and space efficiency.

We achieve time efficiency through the use of a deterministic machine. The output of a

deterministic machine is usually linearly dependent on the size of the input. This fact alone

allows us to consider it optimal for time efficiency. We are able to achieve space efficiency with

classical minimization algorithms for deterministic automata.


1.5.1 Sequential Transducers

This is an extension of the idea of deterministic automata with deterministic input. This type of

transducer is able to produce output strings or weights in addition to deterministically accepting

input. This quality is very useful and supports very efficient programs.

1.5.2 Weighted Finite State Transducers

The use of Finite state automata contributed a lot to the development of speech recognition and

of natural language processing. Such an automaton provides a state transition depending on the

input it receives until it reaches one of the final states; the output state.

Figure 10: Simple Finite State Machine


Nowadays in natural language processing the use of another type of finite state machines has

become widely spread, these machines are the Transducers.

These transducers keep all the functionality of a simple FSM (finite state machine) but

add a weight to each transition. In speech recognition for example this weight is the probability

for each state transition. In addition, in these transducers the input or output label of a transducer

transition can be null. Such a null means that no symbol needs to be consumed or output during

the transition. These null labels are needed to create variable length input and output strings.

They also provide a good way of delaying the output via an inner loop for example.

Composition is a common operation in the use of transducers. It provides a way of

combining different levels of representation. A common application of this in speech recognition

is the composition of a pronunciation lexicon with a word-level grammar to produce a phone-to-

word transducer whose word sequences are restricted to the grammar .

Figure 11: Example of transducer composition.


1.5.3 Transducers in Language Modeling

Initial approaches to language modeling used affix dictionaries to represent natural languages.

This method came in handy to represent languages like English by having a list of the most

common words along with possible affixes. However, when trying to represent more languages,

it was quickly clear that such an approach fails with agglutinative languages.

An agglutinative language is a language in which word roots change internally to form

other nouns. Unlike the English language in which we generally add suffixes to obtain other

word forms like the suffix –ly for adverbs. Hungarian falls under the agglutinative languages for

which we needed to create a dictionary and a language model in FST (finite state transducer)

format. The representation of such a language can be done by “having the last node of the

portion of the FST, which encodes a given suffix, contain outgoing arcs to the first states of

portions of the FST which encode other suffixes” . The advantage of this technique is that when

applied to all the possible affixes, it will then have a solid representation of the agglutination

nature of the language.

1.6 Image File Formats

There are many different file formatting options available for character recognition software. We

primarily dealt with PNG files because it was the only usable format in pyOCR but we were

faced with some challenges during image conversion. Image quality has a huge impact on the

effectiveness of any OCR software and when trying to change between formats, one has to be

aware of lossy vs. lossless compression. These were the formats we ran into during this project:
1.6.1 TIFF

This is a Tagged Image File Format and can be used as a single or multi image file format

(multiple pages in the same file). The TIFF format is very desirable because the most common

compression schemes are all lossless. This means that these types of compression can reduce the

file size (and later returned to their original size) without losing any quality.

1.6.2 PDF

Personal Document Format is currently an open source standard created by Adobe. While the

ability for a PDF to contain text and images is very useful for some applications, this is an

unnecessarily, robust quality that only adds to the file size. A TIFF is much more desirable

because it is can specifically only contain images.

1.6.3 PNG

Portable Network Graphic formatting is a lossless data format and the one that is used by

pyOCR. They are a single image, open, color image format and were created to replace the GIF

image format, which only supported a maximum of 256 colors.

1.6.4 JPEG

The acronym ‘JPEG’ comes from the founding company of the file format, Joint Photographic

Experts Group. This is a lossy image format but can be scaled to tradeoff between storage size

and image quality. This is not ideal for OCR software, but can be used as long as the data is

never compressed.
Chapter 2: SIP and PyQT
Introduction¶
SIP is a tool for automatically generating Python bindings for C and C++ libraries. SIP was
originally developed in 1998 for PyQt - the Python bindings for the Qt GUI toolkit - but is
suitable for generating bindings for any C or C++ library.
This version of SIP generates bindings for Python v2.3 or later, including Python v3.
There are many other similar tools available. One of the original such tools is SWIG and, in fact,
SIP is so called because it started out as a small SWIG. Unlike SWIG, SIP is specifically
designed for bringing together Python and C/C++ and goes to great lengths to make the
integration as tight as possible.
The homepage for SIP is http://www.riverbankcomputing.com/software/sip. Here you will
always find the latest stable version and the latest version of this documentation.
SIP can also be downloaded from the Mercurial repository at
http://www.riverbankcomputing.com/hg/sip.

2.1 License
SIP is licensed under similar terms as Python itself. SIP is also licensed under the GPL (both v2
and v3). It is your choice as to which license you use. If you choose the GPL then any bindings
you create must be distributed under the terms of the GPL.

2.2 Features
SIP, and the bindings it produces, have the following features:
• bindings are fast to load and minimise memory consumption especially when only a
small sub-set of a large library is being used
• automatic conversion between standard Python and C/C++ data types
• overloading of functions and methods with different argument signatures
• support for Python’s keyword argument syntax
• support for both explicitly specified and automatically generated docstrings
• access to a C++ class’s protected methods
• the ability to define a Python class that is a sub-class of a C++ class, including abstract
C++ classes
• Python sub-classes can implement the __dtor__() method which will be called from the
C++ class’s virtual destructor
• support for ordinary C++ functions, class methods, static class methods, virtual class
methods and abstract class methods
• the ability to re-implement C++ virtual and abstract methods in Python
• support for global and class variables
• support for global and class operators
• support for C++ namespaces
• support for C++ templates
• support for C++ exceptions and wrapping them as Python exceptions
• the automatic generation of complementary rich comparison slots
• support for deprecation warnings
• the ability to define mappings between C++ classes and similar Python data types that are
automatically invoked
• the ability to automatically exploit any available run time type information to ensure that
the class of a Python instance object matches the class of the corresponding C++ instance
• the ability to change the type and meta-type of the Python object used to wrap a C/C++
data type
• full support of the Python global interpreter lock, including the ability to specify that a
C++ function of method may block, therefore allowing the lock to be released and other
Python threads to run
• support for consolidated modules where the generated wrapper code for a number of
related modules may be included in a single, possibly private, module
• support for the concept of ownership of a C++ instance (i.e. what part of the code is
responsible for calling the instance’s destructor) and how the ownership may change
during the execution of an application
• the ability to generate bindings for a C++ class library that itself is built on another C++
class library which also has had bindings generated so that the different bindings integrate
and share code properly
• a sophisticated versioning system that allows the full lifetime of a C++ class library,
including any platform specific or optional features, to be described in a single set of
specification files
• support for the automatic generation of PEP 484 type hint stub files
• the ability to include documentation in the specification files which can be extracted and
subsequently processed by external tools
• the ability to include copyright notices and licensing information in the specification files
that is automatically included in all generated source code
• a build system, written in Python, that you can extend to configure, compile and install
your own bindings without worrying about platform specific issues
• support for building your extensions using distutils
• SIP, and the bindings it produces, runs under UNIX, Linux, Windows, MacOS/X,
Android and iOS.

2.3 SIP Components


SIP comprises a number of different components.
• The SIP code generator (sip). This processes .sip specification files and generates C or C+
+ bindings. It is covered in detail in Using SIP.
• The SIP header file (sip.h). This contains definitions and data structures needed by the
generated C and C++ code.
• The SIP module (sip.so or sip.pyd). This is a Python extension module that is imported
automatically by SIP generated bindings and provides them with some common utility
functions. See alsoPython API for Applications.
• The SIP build system (sipconfig.py). This is a pure Python module that is created when
SIP is configured and encapsulates all the necessary information about your system
including relevant directory names, compiler and linker flags, and version numbers. It
also includes several Python classes and functions which help you write configuration
scripts for your own bindings. It is covered in detail in The Build System.
• The SIP distutils extension (sipdistutils.py). This is a distutils extension that can be used
to build your extension modules using distutils and is an alternative to writing
configuration scripts with the SIP build system. This can be as simple as adding your .sip
files to the list of files needed to build the extension module. It is covered in detail
in Building Your Extension with distutils.

2.4 Preparing for SIP v5


The syntax of a SIP specification file will change in SIP v5. The command line options to the SIP
code generator will also change. In order to help users manage the transition the following
approach will be adopted.
• Where possible, all incompatible changes will be first implemented in SIP v4.
• When an incompatible change is implemented, the old syntax will be deprecated (with a
warning message) but will be supported for the lifetime of v4.

2.5 Qt Support
SIP has specific support for the creation of bindings based on Digia’s Qt toolkit.
The SIP code generator understands the signal/slot type safe callback mechanism that Qt uses to
connect objects together. This allows applications to define new Python signals, and allows any
Python callable object to be used as a slot.
SIP itself does not require Qt to be installed.

2.6 Installation

2.6.1 Downloading
You can get the latest release of the SIP source code
from http://www.riverbankcomputing.com/software/sip/download.
SIP is also included with all of the major Linux distributions. However, it may be a version or
two out of date.
2.6.2 Configuring
After unpacking the source package (either a .tar.gz or a .zip file depending on your platform)
you should then check for any README files that relate to your platform.
Next you need to configure SIP by executing the configure.py script. For example:
python configure.py
This assumes that the Python interpreter is on your path. Something like the following may be
appropriate on Windows:
c:\python35\python configure.py
If you have multiple versions of Python installed then make sure you use the interpreter for
which you wish SIP to generate bindings for.
The full set of command line options is:
--version
Display the SIP version number.
-h, --help
Display a help message.
--arch <ARCH>
Binaries for the MacOS/X architecture <ARCH> will be built. This option should be given
once for each architecture to be built. Specifying more than one architecture will cause a
universal binary to be created.
-b <DIR>, --bindir <DIR>
The SIP code generator will be installed in the directory <DIR>.
--configuration <FILE>
New in version 4.16.
<FILE> contains the configuration of the SIP build to be used instead of dynamically
introspecting the system and is typically used when cross-compiling. See Configuring with
Configuration Files.
-d <DIR>, --destdir <DIR>
The sip module will be installed in the directory <DIR>.
--deployment-target <VERSION>
New in version 4.12.1.
Each generated Makefile will set the MACOSX_DEPLOYMENT_TARGET environment
variable to <VERSION>. In order to work around bugs in some versions of Python, this
should be used instead of setting the environment variable in the shell.
-e <DIR>, --incdir <DIR>
The SIP header file will be installed in the directory <DIR>.
-k, --static
The sip module will be built as a static library. This is useful when building the sip module
as a Python builtin.
-n, --universal
The SIP code generator and module will be built as universal binaries under MacOS/X. If
the --arch option has not been specified then the universal binary will include
the i386 and ppc architectures.
--no-pyi
New in version 4.18.
This disables the installation of the sip.pyi type hints stub file.
--no-tools
New in version 4.16.
The SIP code generator and sipconfig module will not be installed.
-p <PLATFORM>, --platform <PLATFORM>
Explicitly specify the platform/compiler to be used by the build system, otherwise a
platform specific default will be used. The --show-platforms option will display all the
supported platform/compilers.
--pyi-dir <DIR>
New in version 4.18.
<DIR> is the name of the directory where the sip.pyi type hints stub file is installed. By
default this is the directory where the sip module is installed.
-s <SDK>, --sdk <SDK>
If the --universal option was given then this specifies the name of the SDK directory. If a
path is not given then it is assumed to be a sub-directory
of/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/S
DKs or /Developer/SDKs.
-u, --debug
The sip module will be built with debugging symbols.
-v <DIR>, --sipdir <DIR>
By default .sip files will be installed in the directory <DIR>.
--show-platforms
The list of all supported platform/compilers will be displayed.
--show-build-macros
The list of all available build macros will be displayed.
--sip-module <NAME>
The sip module will be created with the name <NAME> rather than the
default sip. <NAME> may be of the form package.sub-package.module. See Building a
Private Copy of the sip Module for how to use this to create a private copy of
the sip module.
--sysroot <DIR>
New in version 4.16.
<DIR> is the name of an optional directory that replaces sys.prefix in the names of other
directories (specifically those specifying where the various SIP components will be
installed and where the Python include directories can be found). It is typically used when
cross-compiling or when building a static version of SIP. See Configuring with
Configuration Files.
--target-py-version <VERSION>
New in version 4.16.
<VERSION> is the major and minor version (e.g. 3.4) of the version of Python being
targetted. By default the version of Python being used to run the configure.py script is
used. It is typically used when cross-compiling. See Configuring with Configuration Files.
--use-qmake
New in version 4.16.
Normally the configure.py script uses SIP’s own build system to create the Makefiles for
the code generator and module. This option causes project files (.pro files) used by
Qt’s qmake program to be generated instead. qmake should then be run to generate the
Makefiles. This is particularly useful when cross-compiling.
The configure.py script takes many other options that allows the build system to be finely tuned.
These are of the form name=value or name+=value. The --show-build-macros option will display
each supported name, although not all are applicable to all platforms.
The name=value form means that value will replace the existing value of name.
The name+=value form means that value will be appended to the existing value of name.
For example, the following will disable support for C++ exceptions (and so reduce the size of
module binaries) when used with GCC:
python configure.py CXXFLAGS+=-fno-exceptions
A pure Python module called sipconfig.py is generated by configure.py. This defines
each name and its corresponding value. Looking at it will give you a good idea of how the build
system uses the different options. It is covered in detail in The Build System.

2.6.3 Building
The next step is to build SIP by running your platform’s make command. For example:
make
The final step is to install SIP by running the following command:
make install
(Depending on your system you may require root or administrator privileges.)
This will install the various SIP components.

2.6.4 Configuring with Configuration Files


The configure.py script normally introspects the Python installation of the interpreter running it
in order to determine the names of the various files and directories it needs. This is fine for a
native build of SIP but isn’t appropriate when cross-compiling. In this case it is possible to
supply a configuration file, specified using the --configuration option, which contains definitions
of all the required values.
The format of a configuration file is as follows:
• a configuration item is a single line containing a name/value pair separated by =
• a value may include another value by embedding the name of that value surrounded by %
( and )
• comments begin with # and continue to the end of the line
• blank lines are ignored.
configure.py provides the following preset values for a configuration:
py_major
is the major version number of the target Python installation.
py_minor
is the minor version number of the target Python installation.
sysroot
is the name of the system root directory. This is specified with the --sysroot option.
The following is an example configuration file:

# The target Python installation.


py_platform = linux
py_inc_dir = %(sysroot)/usr/include/python%(py_major)%(py_minor)

# Where SIP will be installed.


sip_bin_dir = %(sysroot)/usr/bin
sip_module_dir = %(sysroot)/usr/lib/python%(py_major)/dist-packages
The following values can be specified in the configuration file:
py_platform
is the target Python platform.
py_inc_dir
is the target Python include directory containing the Python.h file.
py_conf_inc_dir
is the target Python include directory containing the pyconfig.h file. If this isn’t specified
then it defaults to the value of py_inc_dir.
py_pylib_dir
is the target Python library directory.
sip_bin_dir
is the name of the target directory where the SIP code generator will be installed. It can be
overridden by the --bindir option.
sip_inc_dir
is the name of the target directory where the sip.h file will be installed. If this isn’t
specified then it defaults to the value of py_inc_dir. It can be overridden by the --
incdir option.
sip_module_dir
is the target directory where the sip module will be installed. It can be overridden by the --
destdir option.
sip_sip_dir
is the name of the target directory where generated .sip files will be installed by default. It
is only used when creating the sipconfig module. It can be overridden by the --
sipdir option.

2.7 Using SIP


Bindings are generated by the SIP code generator from a number of specification files, typically
with a .sip extension. Specification files look very similar to C and C++ header files, but often
with additional information (in the form of a directive or an annotation) and code so that the
bindings generated can be finely tuned.

2.7.1 A Simple C++ Example


We start with a simple example. Let’s say you have a (fictional) C++ library that implements a
single class called Word. The class has one constructor that takes a \0 terminated character string
as its single argument. The class has one method called reverse() which takes no arguments and
returns a \0 terminated character string. The interface to the class is defined in a header file
called word.h which might look something like this:

// Define the interface to the word library.

class Word {
const char *the_word;

public:
Word(const char *w);

char *reverse() const;


};
The corresponding SIP specification file would then look something like this:
// Define the SIP wrapper to the word library.

%Module word

class Word {

%TypeHeaderCode
#include <word.h>
%End

public:
Word(const char *w);

char *reverse() const;


};
Obviously a SIP specification file looks very much like a C++ (or C) header file, but SIP does
not include a full C++ parser. Let’s look at the differences between the two files.
• The %Module directive has been added [1]. This is used to name the Python module that
is being created, word in this example.
• The %TypeHeaderCode directive has been added. The text between this and the
following %End directive is included literally in the code that SIP generates. Normally it
is used, as in this case, to #include the corresponding C++ (or C) header file [2].
• The declaration of the private variable this_word has been removed. SIP does not support
access to either private or protected instance variables.
If we want to we can now generate the C++ code in the current directory by running the
following command:

sip -c . word.sip
However, that still leaves us with the task of compiling the generated code and linking it against
all the necessary libraries. It’s much easier to use the SIP build system to do the whole thing.
Using the SIP build system is simply a matter of writing a small Python script. In this simple
example we will assume that the word library we are wrapping and it’s header file are installed in
standard system locations and will be found by the compiler and linker without having to specify
any additional flags. In a more realistic example your Python script may take command line
options, or search a set of directories to deal with different configurations and installations.
This is the simplest script (conventionally called configure.py):

import os
import sipconfig

# The name of the SIP build file generated by SIP and used by the build
# system.
build_file = "word.sbf"

# Get the SIP configuration information.


config = sipconfig.Configuration()

# Run SIP to generate the code.


os.system(" ".join([config.sip_bin, "-c", ".", "-b", build_file, "word.sip"]))

# Create the Makefile.


makefile = sipconfig.SIPModuleMakefile(config, build_file)

# Add the library we are wrapping. The name doesn't include any platform
# specific prefixes or extensions (e.g. the "lib" prefix on UNIX, or the
# ".dll" extension on Windows).
makefile.extra_libs = ["word"]

# Generate the Makefile itself.


makefile.generate()
Hopefully this script is self-documenting. The key parts are
the Configuration and SIPModuleMakefile classes. The build system contains other Makefile
classes, for example to build programs or to call other Makefiles in sub-directories.
After running the script (using the Python interpreter the extension module is being created for)
the generated C++ code and Makefile will be in the current directory.
To compile and install the extension module, just run the following commands [3]:
make
make install
That’s all there is to it.
See Building Your Extension with distutils for an example of how to build this example using
distutils.

[1] All SIP directives start with a % as the first non-whitespace character of a line.

[2] SIP includes many code directives like this. They differ in where the supplied code is placed by
SIP in the generated code.

[3] On Windows you might run nmake or mingw32-make instead.

2.7.2 A More Complex C++ Example


In this last example we will wrap a fictional C++ library that contains a class that is derived from
a Qt class. This will demonstrate how SIP allows a class hierarchy to be split across multiple
Python extension modules, and will introduce SIP’s versioning system.
The library contains a single C++ class called Hello which is derived from Qt’s QLabel class. It
behaves just like QLabel except that the text in the label is hard coded to be Hello World. To
make the example more interesting we’ll also say that the library only supports Qt v4.2 and later,
and also includes a function called setDefault() that is not implemented in the Windows version
of the library.
The hello.h header file looks something like this:

// Define the interface to the hello library.

#include <qlabel.h>
#include <qwidget.h>
#include <qstring.h>

class Hello : public QLabel {


// This is needed by the Qt Meta-Object Compiler.
Q_OBJECT

public:
Hello(QWidget *parent = 0);

private:
// Prevent instances from being copied.
Hello(const Hello &);
Hello &operator=(const Hello &);
};

#if !defined(Q_OS_WIN)
void setDefault(const QString &def);
#endif
The corresponding SIP specification file would then look something like this:

// Define the SIP wrapper to the hello library.

%Module hello

%Import QtGui/QtGuimod.sip

%If (Qt_4_2_0 -)

class Hello : public QLabel {

%TypeHeaderCode
#include <hello.h>
%End

public:
Hello(QWidget *parent /TransferThis/ = 0);

private:
Hello(const Hello &);
};

%If (!WS_WIN)
void setDefault(const QString &def);
%End

%End
Again we look at the differences, but we’ll skip those that we’ve looked at in previous examples.
• The %Import directive has been added to specify that we are extending the class
hierarchy defined in the file QtGui/QtGuimod.sip. This file is part of PyQt4. The build
system will take care of finding the file’s exact location.
• The %If directive has been added to specify that everything [4] up to the
matching %End directive only applies to Qt v4.2 and later. Qt_4_2_0 is a tag defined
in QtCoremod.sip [5]using the %Timeline directive. %Timeline is used to define a tag for
each version of a library’s API you are wrapping allowing you to maintain all the
different versions in a single SIP specification. The build system provides support
to configure.py scripts for working out the correct tags to use according to which version
of the library is actually installed.
• The TransferThis annotation has been added to the constructor’s argument. It specifies
that if the argument is not 0 (i.e. the Hello instance being constructed has a parent) then
ownership of the instance is transferred from Python to C++. It is needed because Qt
maintains objects (i.e. instances derived from the QObject class) in a hierachy. When an
object is destroyed all of its children are also automatically destroyed. It is important,
therefore, that the Python garbage collector doesn’t also try and destroy them. This is
covered in more detail inOwnership of Objects. SIP provides many other annotations that
can be applied to arguments, functions and classes. Multiple annotations are separated by
commas. Annotations may have values.
• The = operator has been removed. This operator is not supported by SIP.
• The %If directive has been added to specify that everything up to the
matching %End directive does not apply to Windows. WS_WIN is another tag defined by
PyQt4, this time using the%Platforms directive. Tags defined by the %Platforms directive
are mutually exclusive, i.e. only one may be valid at a time [6].
One question you might have at this point is why bother to define the private copy constructor
when it can never be called from Python? The answer is to prevent the automatic generation of a
public copy constructor.
We now look at the configure.py script. This is a little different to the script in the previous
examples for two related reasons.
Firstly, PyQt4 includes a pure Python module called pyqtconfig that extends the SIP build system
for modules, like our example, that build on top of PyQt4. It deals with the details of which
version of Qt is being used (i.e. it determines what the correct tags are) and where it is installed.
This is called a module’s configuration module.
Secondly, we generate a configuration module (called helloconfig) for our own hello module.
There is no need to do this, but if there is a chance that somebody else might want to extend your
C++ library then it would make life easier for them.
Now we have two scripts. First the configure.py script:

import os
import sipconfig
from PyQt4 import pyqtconfig

# The name of the SIP build file generated by SIP and used by the build
# system.
build_file = "hello.sbf"

# Get the PyQt4 configuration information.


config = pyqtconfig.Configuration()
# Get the extra SIP flags needed by the imported PyQt4 modules. Note that
# this normally only includes those flags (-x and -t) that relate to SIP's
# versioning system.
pyqt_sip_flags = config.pyqt_sip_flags

# Run SIP to generate the code. Note that we tell SIP where to find the qt
# module's specification files using the -I flag.
os.system(" ".join([config.sip_bin, "-c", ".", "-b", build_file, "-I",
config.pyqt_sip_dir, pyqt_sip_flags, "hello.sip"]))

# We are going to install the SIP specification file for this module and
# its configuration module.
installs = []

installs.append(["hello.sip", os.path.join(config.default_sip_dir, "hello")])

installs.append(["helloconfig.py", config.default_mod_dir])

# Create the Makefile. The QtGuiModuleMakefile class provided by the


# pyqtconfig module takes care of all the extra preprocessor, compiler and
# linker flags needed by the Qt library.
makefile = pyqtconfig.QtGuiModuleMakefile(
configuration=config,
build_file=build_file,
installs=installs
)

# Add the library we are wrapping. The name doesn't include any platform
# specific prefixes or extensions (e.g. the "lib" prefix on UNIX, or the
# ".dll" extension on Windows).
makefile.extra_libs = ["hello"]

# Generate the Makefile itself.


makefile.generate()

# Now we create the configuration module. This is done by merging a Python


# dictionary (whose values are normally determined dynamically) with a
# (static) template.
content = {
# Publish where the SIP specifications for this module will be
# installed.
"hello_sip_dir": config.default_sip_dir,

# Publish the set of SIP flags needed by this module. As these are the
# same flags needed by the qt module we could leave it out, but this
# allows us to change the flags at a later date without breaking
# scripts that import the configuration module.
"hello_sip_flags": pyqt_sip_flags
}

# This creates the helloconfig.py module from the helloconfig.py.in


# template and the dictionary.
sipconfig.create_config_module("helloconfig.py", "helloconfig.py.in", content)
Next we have the helloconfig.py.in template script:
from PyQt4 import pyqtconfig

# These are installation specific values created when Hello was configured.
# The following line will be replaced when this template is used to create
# the final configuration module.
# @SIP_CONFIGURATION@

class Configuration(pyqtconfig.Configuration):
"""The class that represents Hello configuration values.
"""
def __init__(self, sub_cfg=None):
"""Initialise an instance of the class.

sub_cfg is the list of sub-class configurations. It should be None


when called normally.
"""
# This is all standard code to be copied verbatim except for the
# name of the module containing the super-class.
if sub_cfg:
cfg = sub_cfg
else:
cfg = []

cfg.append(_pkg_config)

pyqtconfig.Configuration.__init__(self, cfg)

class HelloModuleMakefile(pyqtconfig.QtGuiModuleMakefile):
"""The Makefile class for modules that %Import hello.
"""
def finalise(self):
"""Finalise the macros.
"""
# Make sure our C++ library is linked.
self.extra_libs.append("hello")

# Let the super-class do what it needs to.


pyqtconfig.QtGuiModuleMakefile.finalise(self)
Again, we hope that the scripts are self documenting.

[4] Some parts of a SIP specification aren’t subject to version control.

[5] Actually in versions.sip. PyQt4 uses the %Include directive to split the SIP specification for Qt
across a large number of separate .sip files.

[6] Tags can also be defined by the %Feature directive. These tags are not mutually exclusive, i.e.
any number may be valid at a time.

2.7.3 Ownership of Objects


When a C++ instance is wrapped a corresponding Python object is created. The Python object
behaves as you would expect in regard to garbage collection - it is garbage collected when its
reference count reaches zero. What then happens to the corresponding C++ instance? The
obvious answer might be that the instance’s destructor is called. However the library API may
say that when the instance is passed to a particular function, the library takes ownership of the
instance, i.e. responsibility for calling the instance’s destructor is transferred from the SIP
generated module to the library.
Ownership of an instance may also be associated with another instance. The implication being
that the owned instance will automatically be destroyed if the owning instance is destroyed. SIP
keeps track of these relationships to ensure that Python’s cyclic garbage collector can detect and
break any reference cycles between the owning and owned instances. The association is
implemented as the owning instance taking a reference to the owned instance.
The TransferThis, Transfer and TransferBack annotations are used to specify where, and it what
direction, transfers of ownership happen. It is very important that these are specified correctly to
avoid crashes (where both Python and C++ call the destructor) and memory leaks (where neither
Python and C++ call the destructor).
This applies equally to C structures where the structure is returned to the heap using
the free() function.

2.7.4 Types and Meta-types


Every Python object (with the exception of the object object itself) has a meta-type and at least
one super-type. By default an object’s meta-type is the meta-type of its first super-type.
SIP implements two super-types, sip.simplewrapper and sip.wrapper, and a meta-
type, sip.wrappertype.
sip.simplewrapper is the super-type of sip.wrapper. The super-type
of sip.simplewrapper is object.
sip.wrappertype is the meta-type of both sip.simplewrapper and sip.wrapper. The super-type
of sip.wrappertype is type.
sip.wrapper supports the concept of object ownership described in Ownership of Objects and, by
default, is the super-type of all the types that SIP generates.
sip.simplewrapper does not support the concept of object ownership but SIP generated types that
are sub-classed from it have Python objects that take less memory.
SIP allows a class’s meta-type and super-type to be explicitly specified using
the Metatype and Supertype class annotations.
SIP also allows the default meta-type and super-type to be changed for a module using
the %DefaultMetatype and %DefaultSupertype directives. Unlike the default super-type, the
default meta-type is inherited by importing modules.
If you want to use your own meta-type or super-type then they must be sub-classed from one of
the SIP provided types. Your types must be registered using sipRegisterPyType(). This is
normally done in code specified using the %InitialisationCode directive.
As an example, PyQt4 uses %DefaultMetatype to specify a new meta-type that handles the
interaction with Qt’s own meta-type system. It also uses %DefaultSupertype to specify that the
smallersip.simplewrapper super-type is normally used. Finally it uses Supertype as an annotation
of the QObject class to override the default and use sip.wrapper as the super-type so that the
parent/child relationships of QObject instances are properly maintained.

2.7.5 Lazy Type Attributes


Instead of populating a wrapped type’s dictionary with its attributes (or descriptors for those
attributes) SIP only creates objects for those attributes when they are actually needed. This is
done to reduce the memory footprint and start up time when used to wrap large libraries with
hundreds of classes and tens of thousands of attributes.
SIP allows you to extend the handling of lazy attributes to your own attribute types by allowing
you to register an attribute getter handler (using sipRegisterAttributeGetter()). This will be called
just before a type’s dictionary is accessed for the first time.

2.8 Support for Python’s Buffer Interface


SIP supports Python’s buffer interface in that whenever C/C++ requires a char or char * type
then any Python type that supports the buffer interface (including ordinary Python strings) can be
used.
If a buffer is made up of a number of segments then all but the first will be ignored.

2.9 Support for Wide Characters


SIP v4.6 introduced support for wide characters (i.e. the wchar_t type). Python’s C API includes
support for converting between unicode objects and wide character strings and arrays. When
converting from a unicode object to wide characters SIP creates the string or array on the heap
(using memory allocated using sipMalloc()). This then raises the problem of how this memory is
subsequently freed.
The following describes how SIP handles this memory in the different situations where this is an
issue.
• When a wide string or array is passed to a function or method then the memory is freed
(using sipFree()) after that function or method returns.
• When a wide string or array is returned from a virtual method then SIP does not free the
memory until the next time the method is called.
• When an assignment is made to a wide string or array instance variable then SIP does not
first free the instance’s current string or array.

2.10 The Python Global Interpreter Lock


Python’s Global Interpretor Lock (GIL) must be acquired before calls can be made to the Python
API. It should also be released when a potentially blocking call to C/C++ library is made in order
to allow other Python threads to be executed. In addition, some C/C++ libraries may implement
their own locking strategies that conflict with the GIL causing application deadlocks. SIP
provides ways of specifying when the GIL is released and acquired to ensure that locking
problems can be avoided.
SIP always ensures that the GIL is acquired before making calls to the Python API. By default
SIP does not release the GIL when making calls to the C/C++ library being wrapped.
The ReleaseGIL annotation can be used to override this behaviour when required.
If SIP is given the -g command line option then the default behaviour is changed and SIP releases
the GIL every time is makes calls to the C/C++ library being wrapped. The HoldGIL annotation
can be used to override this behaviour when required.

2.11 Building a Private Copy of the sip Module


New in version 4.12.
The sip module is intended to be be used by all the SIP generated modules of a particular Python
installation. For example PyQt3 and PyQt4 are completely independent of each other but will use
the same sipmodule. However, this means that all the generated modules must be built against a
compatible version of SIP. If you do not have complete control over the Python installation then
this may be difficult or even impossible to achieve.
To get around this problem you can build a private copy of the sip module that has a different
name and/or is placed in a different Python package. To do this you use the --sip-module option
to specify the name (optionally including a package name) of your private copy.
As well as building the private copy of the module, the version of the sip.h header file will also
be specific to the private copy. You will probably also want to use the --incdir option to specify
the directory where the header file will be installed to avoid overwriting a copy of the default
version that might already be installed.
When building your generated modules you must ensure that they #include the private copy
of sip.h instead of any default version.

The SIP Command Line


The syntax of the SIP command line is:

sip [options] [specification]


specification is the name of the specification file for the module. If it is omitted then stdin is
used.
The full set of command line options is:
-h
Display a help message.
-V
Display the SIP version number.
-a <FILE>
Deprecated since version 4.18.
The name of the QScintilla API file to generate. This file contains a description of the
module API in a form that the QScintilla editor component can use for auto-completion
and call tips. (The file may also be used by the SciTE editor but must be sorted first.) By
default the file is not generated.
-b <FILE>
The name of the build file to generate. This file contains the information about the module
needed by the SIP build system to generate a platform and compiler specific Makefile for
the module. By default the file is not generated.
-B <TAG>
New in version 4.16.
The tag is added to the list of backstops. The option may be given more than once if
multiple timelines have been defined. See the %Timeline directive for more details.
-c <DIR>
The name of the directory (which must exist) into which all of the generated C or C++
code is placed. By default no code is generated.
-d <FILE>
Deprecated since version 4.12: Use the -X option instead.
The name of the documentation file to generate. Documentation is included in specification
files using the %Doc and %ExportedDoc directives. By default the file is not generated.
-e
Support for C++ exceptions is enabled. This causes all calls to C++ code to be enclosed
in try/catch blocks and C++ exceptions to be converted to Python exceptions. By default
exception support is disabled.
-f
New in version 4.18.
Warnings are handled as if they were errors and the program terminates.
-g
The Python GIL is released before making any calls to the C/C++ library being wrapped
and reacquired afterwards. See The Python Global Interpreter Lock and
the ReleaseGIL and HoldGIL annotations.
-I <DIR>
The directory is added to the list of directories searched when looking for a specification
file given in an %Include or %Import directive. Directory separators must always be /. This
option may be given any number of times.
-j <NUMBER>
The generated code is split into the given number of files. This makes it easier to use the
parallel build facility of most modern implementations of make. By default 1 file is
generated for each C structure or C++ class.
-k
New in version 4.10.
Deprecated since version 4.12: Use the keyword_arguments="All" %Module directive
argument instead.
All functions and methods will, by default, support passing parameters using the Python
keyword argument syntax.
-o
New in version 4.10.
Docstrings will be automatically generated that describe the signature of all functions,
methods and constructors.
-p <MODULE>
The name of the %ConsolidatedModule which will contain the wrapper code for this
component module.
-P
New in version 4.10.
By default SIP generates code to provide access to protected C++ functions from Python.
On some platforms (notably Linux, but not Windows) this code can be avoided if
the protected keyword is redefined as public during compilation. This can result in a
significant reduction in the size of a generated Python module. This option disables the
generation of the extra code.
-r
Debugging statements that trace the execution of the bindings are automatically generated.
By default the statements are not generated.
-s <SUFFIX>
The suffix to use for generated C or C++ source files. By default .c is used for C
and .cpp for C++.
-t <TAG>
The SIP version tag (declared using a %Timeline directive) or the SIP platform tag
(declared using the %Platforms directive) to generate code for. This option may be given
any number of times so long as the tags do not conflict.
-T
Deprecated since version 4.16.6: This option is now ignored and timestamps are always
disabled.
By default the generated C and C++ source and header files include a timestamp specifying
when they were generated. This option disables the timestamp so that the contents of the
generated files remain constant for a particular version of SIP.
-w
The display of warning messages is enabled. By default warning messages are disabled.
-x <FEATURE>
The feature (declared using the %Feature directive) is disabled.
-X <ID:FILE>
New in version 4.12.
The extract (defined with the %Extract directive) with the identifier ID is written to the
file FILE.
-y <FILE>
New in version 4.18.
The name of the Python type hints stub file to generate. This file contains a description of
the module API that is compliant with PEP 484. By default the file is not generated.
-z <FILE>
Deprecated since version 4.16.6: Use the @<FILE> style instead.
The name of a file containing more command line options.
Command line options can also be placed in a file and passed on the command line using
the @ prefix.

2.12 SIP Specification Files


A SIP specification consists of some C/C++ type and function declarations and some directives.
The declarations may contain annotations which provide SIP with additional information that
cannot be expressed in C/C++. SIP does not include a full C/C++ parser.
It is important to understand that a SIP specification describes the Python API, i.e. the API
available to the Python programmer when they import the generated module. It does not have to
accurately represent the underlying C/C++ library. There is nothing wrong with omitting
functions that make little sense in a Python context, or adding functions implemented with
handwritten code that have no C/C++ equivalent. It is even possible (and sometimes necessary)
to specify a different super-class hierarchy for a C++ class. All that matters is that the generated
code compiles properly.
In most cases the Python API matches the C/C++ API. In some cases handwritten code
(see %MethodCode) is used to map from one to the other without SIP having to know the details
itself. However, there are a few cases where SIP generates a thin wrapper around a C++ method
or constructor (see Generated Derived Classes) and needs to know the exact C++ signature. To
deal with these cases SIP allows two signatures to be specified. For example:

class Klass
{
public:
// The Python signature is a tuple, but the underlying C++ signature
// is a 2 element array.
Klass(SIP_PYTUPLE) [(int *)];
%MethodCode
int iarr[2];

if (PyArg_ParseTuple(a0, "ii", &iarr[0], &iarr[1]))


{
// Note that we use the SIP generated derived class
// constructor.
Py_BEGIN_ALLOW_THREADS
sipCpp = new sipKlass(iarr);
Py_END_ALLOW_THREADS
}
%End
};

2.13 Variable Numbers of Arguments


SIP supports the use of ... as the last part of a function signature. Any remaining arguments are
collected as a Python tuple.

2.14 Additional SIP Types


SIP supports a number of additional data types that can be used in Python signatures.
SIP_ANYSLOT
Deprecated since version 4.18.
This is both a const char * and a PyObject * that is used as the type of the member instead
of const char * in functions that implement the connection or disconnection of an explicitly
generated signal to a slot. Handwritten code must be provided to interpret the conversion
correctly.
SIP_PYBUFFER
This is a PyObject * that implements the Python buffer protocol.
SIP_PYCALLABLE
This is a PyObject * that is a Python callable object.
SIP_PYDICT
This is a PyObject * that is a Python dictionary object.
SIP_PYLIST
This is a PyObject * that is a Python list object.
SIP_PYOBJECT
This is a PyObject * of any Python type. The type PyObject * can also be used.
SIP_PYSLICE
This is a PyObject * that is a Python slice object.
SIP_PYTUPLE
This is a PyObject * that is a Python tuple object.
SIP_PYTYPE
This is a PyObject * that is a Python type object.
SIP_QOBJECT
Deprecated since version 4.18.
This is a QObject * that is a C++ instance of a class derived from Qt’s QObject class.
SIP_RXOBJ_CON
Deprecated since version 4.18.
This is a QObject * that is a C++ instance of a class derived from Qt’s QObject class. It is used
as the type of the receiver instead of const QObject * in functions that implement a connection to
a slot.
SIP_RXOBJ_DIS
Deprecated since version 4.18.
This is a QObject * that is a C++ instance of a class derived from Qt’s QObject class. It is used
as the type of the receiver instead of const QObject * in functions that implement a disconnection
from a slot.
SIP_SIGNAL
Deprecated since version 4.18.
This is a const char * that is used as the type of the signal instead of const char * in functions that
implement the connection or disconnection of an explicitly generated signal to a slot.
SIP_SLOT
Deprecated since version 4.18.
This is a const char * that is used as the type of the member instead of const char * in functions
that implement the connection or disconnection of an explicitly generated signal to a slot.
SIP_SLOT_CON
Deprecated since version 4.18.
This is a const char * that is used as the type of the member instead of const char * in functions
that implement the connection of an internally generated signal to a slot. The type includes a
comma separated list of types that is the C++ signature of of the signal.
To take an example, QAccel::connectItem() connects an internally generated signal to a slot. The
signal is emitted when the keyboard accelerator is activated and it has a single integer argument
that is the ID of the accelerator. The C++ signature is:
bool connectItem(int id, const QObject *receiver, const char *member);
The corresponding SIP specification is:
bool connectItem(int, SIP_RXOBJ_CON, SIP_SLOT_CON(int));
SIP_SLOT_DIS
Deprecated since version 4.18.
This is a const char * that is used as the type of the member instead of const char * in functions
that implement the disconnection of an internally generated signal to a slot. The type includes a
comma separated list of types that is the C++ signature of of the signal.
SIP_SSIZE_T
This is a Py_ssize_t in Python v2.5 and later and int in earlier versions of Python.

2.15 Python API for Applications


The main purpose of the sip module is to provide functionality common to all SIP generated
bindings. It is loaded automatically and most of the time you will completely ignore it. However,
it does expose some functionality that can be used by applications.
class sip.array
New in version 4.15.
This is the type object for the type SIP uses to represent an array of a limited number of
C/C++ types. Typically the memory is not owned by Python so that it is not freed when the
object is garbage collected. A sip.array object can be created from a sip.voidptr object by
calling sip.voidptr.asarray(). This allows the underlying memory (interpreted as a sequence
of unsigned bytes) to be processed much more quickly.
sip.cast(obj, type) → object
This does the Python equivalent of casting a C++ instance to one of its sub or super-class
types.
Parameters:• obj – the Python object.
• type – the type.
Returns: a new Python object is that wraps the same C++ instance as obj, but has
the type type.
sip.delete(obj)
For C++ instances this calls the C++ destructor. For C structures it returns the structure’s
memory to the heap.
Parameters: obj – the Python object.
sip.dump(obj)
This displays various bits of useful information about the internal state of the Python object
that wraps a C++ instance or C structure.
Parameters: obj – the Python object.
sip.enableautoconversion(type, enable) → bool
New in version 4.14.7.
Instances of some classes may be automatically converted to other Python objects even
though the class has been wrapped. This allows that behaviour to be suppressed so that an
instances of the wrapped class is returned instead.
Parameters:• type – the Python type object.
• enable – is True if auto-conversion should be enabled for the type. This is
the default behaviour.
Returns: True or False depending on whether or not auto-conversion was previously
enabled for the type. This allows the previous state to be restored later on.
sip.getapi(name) → version
New in version 4.9.
This returns the version number that has been set for an API. The version number is either
set explicitly by a call to sip.setapi() or implicitly by importing the module that defines it.
Parameters: name – the name of the API.
Returns: The version number that has been set for the API. An exception will be
raised if the API is unknown.
sip.isdeleted(obj) → bool
This checks if the C++ instance or C structure has been deleted and returned to the heap.
Parameters: obj – the Python object.
Returns: True if the C/C++ instance has been deleted.
sip.ispycreated(obj) → bool
New in version 4.12.1.
This checks if the C++ instance or C structure was created by Python. If it was then it is
possible to call a C++ instance’s protected methods.
Parameters: obj – the Python object.
Returns: True if the C/C++ instance was created by Python.
sip.ispyowned(obj) → bool
This checks if the C++ instance or C structure is owned by Python.
Parameters: obj – the Python object.
Returns: True if the C/C++ instance is owned by Python.
sip.setapi(name, version)
New in version 4.9.
This sets the version number of an API. An exception is raised if a different version
number has already been set, either explicitly by a previous call, or implicitly by importing
the module that defines it.
Parameters:• name – the name of the API.
• version – The version number to set for the API. Version numbers must be
greater than or equal to 1.
sip.setdeleted(obj)
This marks the C++ instance or C structure as having been deleted and returned to the heap
so that future references to it raise an exception rather than cause a program crash.
Normally SIP handles such things automatically, but there may be circumstances where
this isn’t possible.
Parameters: obj – the Python object.
sip.setdestroyonexit(destroy)

New in version 4.14.2.


When the Python interpreter exits it garbage collects those objects that it can. This means
that any corresponding C++ instances and C structures owned by Python are destroyed.
Unfortunately this happens in an unpredictable order and so can cause memory faults
within the wrapped library. Calling this function with a value of False disables the
automatic destruction of C++ instances and C structures.
Parameters: destroy – True if all C++ instances and C structures owned by Python
should be destroyed when the interpreter exits. This is the default.
sip.settracemask(mask)
If the bindings have been created with SIP’s -r command line option then the generated
code will include debugging statements that trace the execution of the code. (It is
particularly useful when trying to understand the operation of a C++ library’s virtual
function calls.)
Parameters: mask – the mask that determines which debugging statements are enabled.
Debugging statements are generated at the following points:
• in a C++ virtual function (mask is 0x0001)
• in a C++ constructor (mask is 0x0002)
• in a C++ destructor (mask is 0x0004)
• in a Python type’s __init__ method (mask is 0x0008)
• in a Python type’s __del__ method (mask is 0x0010)
• in a Python type’s ordinary method (mask is 0x0020).
By default the trace mask is zero and all debugging statements are disabled.
class sip.simplewrapper
This is an alternative type object than can be used as the base type of an instance wrapped
by SIP. Objects using this are smaller than those that use the default sip.wrapper type but
do not support the concept of object ownership.
sip.SIP_VERSION
New in version 4.2.
This is a Python integer object that represents the SIP version number as a 3 part
hexadecimal number (e.g. v4.0.0 is represented as 0x040000).
sip.SIP_VERSION_STR
New in version 4.3.
This is a Python string object that defines the SIP version number as represented as a
string. For development versions it will contain either .dev or -snapshot-.
sip.transferback(obj)
This function is a wrapper around sipTransferBack().
sip.transferto(obj, owner)
This function is a wrapper around sipTransferTo().
sip.unwrapinstance(obj) → integer
This returns the address, as an integer, of a wrapped C/C++ structure or class instance.
Parameters: obj – the Python object.
Returns: an integer that is the address of the C/C++ instance.
class sip.voidptr
This is the type object for the type SIP uses to represent a C/C++ void *. It may have a size
associated with the address in which case the Python buffer interface is supported. The
type has the following methods.
__init__(address[, size=-1[, writeable=True]])
Parameters:• address – the address, either another sip.voidptr, None, a Python
Capsule, a Python CObject, an object that implements the buffer
protocol or an integer.
• size – the optional associated size of the block of memory and is
negative if the size is not known.
• writeable – set if the memory is writeable. If it is not specified,
and address is a sip.voidptr instance then its value will be used.
__int__() → integer
This returns the address as an integer.
Returns: the integer address.
__getitem__(idx) → item
New in version 4.12.
This returns the item at a given index. An exception will be raised if the address does not
have an associated size. In this way it behaves like a Python memoryview object.
Parameters: idx – is the index which may either be an integer, an object that
implements __index__() or a slice object.
Returns: the item. If the index is an integer then the item will be a Python
v2 string object or a Python v3 bytes object containing the single
byte at that index. If the index is a slice object then the item will
be a new voidptr object defining the subset of the memory
corresponding to the slice.
__hex__() → string
This returns the address as a hexadecimal string.
Returns: the hexadecimal string address.
__len__() → integer
New in version 4.12.
This returns the size associated with the address.
Returns: the associated size. An exception will be raised if there is none.
__setitem__(idx, item)
New in version 4.12.
This updates the memory at a given index. An exception will be raised if the address does
not have an associated size or is not writable. In this way it behaves like a
Python memoryview object.
Parameters:• idx – is the index which may either be an integer, an object that
implements __index__() or a slice object.
• item – is the data that will update the memory defined by the
index. It must implement the buffer interface and be the same
size as the data that is being updated.
asarray([size=-1]) → :class:`sip.array`
New in version 4.16.5.
This returned the block of memory as a sip.array object. The memory is not copied.
Parameters: size – the size of the array. If it is negative then the size
associated with the address is used. If there is no associated size
then an exception is raised.
Returns: the sip.array object.
ascapsule() → capsule
New in version 4.10.
This returns the address as an unnamed Python Capsule. This requires Python v3.1 or later
or Python v2.7 or later.
Returns: the Capsule.
ascobject() → cObject
This returns the address as a Python CObject. This is deprecated with Python v3.1 and is
not supported with Python v3.2 and later.
Returns: the CObject.
asstring([size=-1]) → string/bytes
This returns a copy of the block of memory as a Python v2 string object or a Python v3
bytes object.
Parameters: size – the number of bytes to copy. If it is negative then the size
associated with the address is used. If there is no associated size
then an exception is raised.
Returns: the string or bytes object.
getsize() → integer
This returns the size associated with the address.
Returns: the associated size which will be negative if there is none.
setsize(size)
This sets the size associated with the address.
Parameters: size – the size to associate. If it is negative then no size is
associated.
getwriteable() → bool
This returns the writeable state of the memory.
Returns: True if the memory is writeable.
setwriteable(writeable)
This sets the writeable state of the memory.
Parameters: writeable – the writeable state to set.
sip.wrapinstance(addr, type) → object
This wraps a C structure or C++ class instance in a Python object. If the instance has
already been wrapped then a new reference to the existing object is returned.
Parameters:• addr – the address of the instance as a number.
• type – the Python type of the instance.
Returns: the Python object that wraps the instance.
class sip.wrapper
This is the type object of the default base type of all instances wrapped by SIP.
The Supertype class annotation can be used to specify a different base type for a class.
class sip.wrappertype
This is the type object of the metatype of the sip.wrapper type.
Chapter 3: pyTesser
3.1 Introduction:
PyTesser is an Optical Character Recognition module for Python. It takes as input an image or
image file and outputs a string.
PyTesser uses the Tesseract OCR engine (an Open Source project at Google), converting images
to an accepted format and calling the Tesseract executable as an external script. A Windows
executable is provided along with the Python scripts. The scripts should work in Linux as well.
PyTesser: http://code.google.com/p/pytesser/ Tesseract: http://code.google.com/p/tesseract-ocr/

3.2 Dependencies:
PIL is required to work with images in memory. PyTesser has been tested with Python 2.4 in
Windows XP.http://www.pythonware.com/products/pil/

3.3 Installation:
PyTesser has no installation functionality in this release. Extract pytesser.zip into directory with
other scripts. Necessary files are listed in File Dependencies below.

3.4 Usage:
from pytesser import * im = Image.open('phototest.tif') text =
image_to_string(im) print text This is a lot of 12 point text to test the ocr
code and see if it works on all types of file format. The quick brown dog
jumped over the lazy fox. The quick brown dog jumped over the lazy
fox. The quick brown dog jumped over the lazy fox. The quick brown
dog jumped over the lazy fox.

try: ... text = image_file_to_string('fnord.tif', graceful_errors=False) ...


except errors.Tesser_General_Exception, value: ... print "fnord.tif is
incompatible filetype. Try graceful_errors=True" ... print value ...
fnord.tif is incompatible filetype. Try graceful_errors=True Tesseract
Open Source OCR Engine read_tif_image:Error:Illegal image
format:Compression Tessedit:Error:Read of file failed:fnord.tif
Signal_exit 31 ABORT. LocCode: 3 AbortCode: 3

text = image_file_to_string('fnord.tif', graceful_errors=True) print


"fnord.tif contents:", text fnord.tif contents: fnord

text = image_file_to_string('fonts_test.png', graceful_errors=True) print


text 12 pt And Arnazwngw few dwscotheques provwde jukeboxes Tames
Amazmgly few dnscotheques pmvxde Jukeboxes 24 pt: Arial:
Amazingly few discotheques provide jul

3.4 File Dependencies:


pytesser.py Main module for importing util.py Utility functions used by pytesser.py errors.py
Interprets exceptions thrown by Tesseract tesseract.exe Executable called by pytesser.py tessdata/
Resources used by tesseract.exe

3.5 Python Image Library


3.5.1 Introduction
The Python Imaging Library adds image processing capabilities to your Python interpreter.
This library provides extensive file format support, an efficient internal representation, and fairly
powerful image processing capabilities.
The core image library is designed for fast access to data stored in a few basic pixel formats. It
should provide a solid foundation for a general image processing tool.
Let’s look at a few possible uses of this library:

3.5.2 Image Archives


The Python Imaging Library is ideal for for image archival and batch processing applications.
You can use the library to create thumbnails, convert between file formats, print images, etc.
The current version identifies and reads a large number of formats. Write support is intentionally
restricted to the most commonly used interchange and presentation formats.

3.5.3 Image Display


The current release includes Tk PhotoImage and BitmapImage interfaces, as well as a Windows
DIB interface that can be used with PythonWin and other Windows-based toolkits. Many other
GUI toolkits come with some kind of PIL support.
For debugging, there’s also a show method which saves an image to disk, and calls an external
display utility.

3.5.4 Image Processing


The library contains basic image processing functionality, including point operations, filtering
with a set of built-in convolution kernels, and colour space conversions.
The library also supports image resizing, rotation and arbitrary affine transforms.
There’s a histogram method allowing you to pull some statistics out of an image. This can be
used for automatic contrast enhancement, and for global statistical analysis.

3.5.5 Using the Image Class


The most important class in the Python Imaging Library is the Image class, defined in the
module with the same name. You can create instances of this class in several ways; either by
loading images from files, processing other images, or creating images from scratch.
To load an image from a file, use the open function in the Image module.

>>> import Image


>>> im = Image.open("lena.ppm")
If successful, this function returns an Image object. You can now use instance
attributes to examine the file contents.
>>> print im.format, im.size, im.mode
PPM (512, 512) RGB

The format attribute identifies the source of an image. If the image was not read from a file, it is
set to None. The size attribute is a 2-tuple containing width and height (in pixels). The mode
attribute defines the number and names of the bands in the image, and also the pixel type and
depth. Common modes are “L” (luminance) for greyscale images, “RGB” for true colour images,
and “CMYK” for pre-press images.
If the file cannot be opened, an IOError exception is raised.
Once you have an instance of the Image class, you can use the methods defined by this class to
process and manipulate the image. For example, let’s display the image we just loaded:

>>> im.show()

(The standard version of show is not very efficient, since it saves the image to a temporary file
and calls the xv utility to display the image. If you don’t have xv installed, it won’t even work.
When it does work though, it is very handy for debugging and tests.)
The following sections provide an overview of the different functions provided in this library.

Reading and Writing Images


The Python Imaging Library supports a wide variety of image file formats. To read files from
disk, use the open function in the Image module. You don’t have to know the file format to open
a file. The library automatically determines the format based on the contents of the file.
To save a file, use the save method of the Image class. When saving files, the name becomes
important. Unless you specify the format, the library uses the filename extension to discover
which file storage format to use.

Convert files to JPEG

import os, sys


import Image

for infile in sys.argv[1:]:


f, e = os.path.splitext(infile)
outfile = f + ".jpg"
if infile != outfile:
try:
Image.open(infile).save(outfile)
except IOError:
print "cannot convert", infile
A second argument can be supplied to the save method which explicitly specifies a file format. If
you use a non-standard extension, you must always specify the format this way:

Create JPEG Thumbnails


import os, sys
import Image

size = 128, 128

for infile in sys.argv[1:]:


outfile = os.path.splitext(infile)[0] + ".thumbnail"
if infile != outfile:
try:
im = Image.open(infile)
im.thumbnail(size)
im.save(outfile, "JPEG")
except IOError:
print "cannot create thumbnail for", infile

It is important to note that the library doesn’t decode or load the raster data unless it really has to.
When you open a file, the file header is read to determine the file format and extract things like
mode, size, and other properties required to decode the file, but the rest of the file is not
processed until later.
This means that opening an image file is a fast operation, which is independent of the file size
and compression type. Here’s a simple script to quickly identify a set of image files:

Identify Image Files


import sys
import Image

for infile in sys.argv[1:]:


try:
im = Image.open(infile)
print infile, im.format, "%dx%d" % im.size, im.mode
except IOError:
pass

Cutting, Pasting and Merging Images


The Image class contains methods allowing you to manipulate regions within an image. To
extract a sub-rectangle from an image, use the crop method.
Copying a subrectangle from an image

box = (100, 100, 400, 400)


region = im.crop(box)

The region is defined by a 4-tuple, where coordinates are (left, upper, right, lower). The Python
Imaging Library uses a coordinate system with (0, 0) in the upper left corner. Also note that
coordinates refer to positions between the pixels, so the region in the above example is exactly
300x300 pixels.
The region could now be processed in a certain manner and pasted back.
Processing a subrectangle, and pasting it back

region = region.transpose(Image.ROTATE_180)
im.paste(region, box)

When pasting regions back, the size of the region must match the given region exactly. In
addition, the region cannot extend outside the image. However, the modes of the original image
and the region do not need to match. If they don’t, the region is automatically converted before
being pasted (see the section on Colour Transforms below for details).
Here’s an additional example:

Rolling an image

def roll(image, delta):


"Roll an image sideways"

xsize, ysize = image.size

delta = delta % xsize


if delta == 0: return image

part1 = image.crop((0, 0, delta, ysize))


part2 = image.crop((delta, 0, xsize, ysize))
image.paste(part2, (0, 0, xsize-delta, ysize))
image.paste(part1, (xsize-delta, 0, xsize, ysize))

return image

For more advanced tricks, the paste method can also take a transparency mask as an optional
argument. In this mask, the value 255 indicates that the pasted image is opaque in that position
(that is, the pasted image should be used as is). The value 0 means that the pasted image is
completely transparent. Values in-between indicate different levels of transparency.
The Python Imaging Library also allows you to work with the individual bands of an multi-band
image, such as an RGB image. The split method creates a set of new images, each containing one
band from the original multi-band image. The merge function takes a mode and a tuple of
images, and combines them into a new image. The following sample swaps the three bands of an
RGB image:
Splitting and merging bands

r, g, b = im.split()
im = Image.merge("RGB", (b, g, r))

Note that for a single-band image, split returns the image itself. To work with individual colour
bands, you may want to convert the image to “RGB” first.

Geometrical Transforms
The Image class contains methods to resize and rotate an image. The former takes a tuple giving
the new size, the latter the angle in degrees counter-clockwise.
Simple geometry transforms

out = im.resize((128, 128))


out = im.rotate(45) # degrees counter-clockwise

To rotate the image in 90 degree steps, you can either use the rotate method or the transpose
method. The latter can also be used to flip an image around its horizontal or vertical axis.
Transposing an image

out = im.transpose(Image.FLIP_LEFT_RIGHT)
out = im.transpose(Image.FLIP_TOP_BOTTOM)
out = im.transpose(Image.ROTATE_90)
out = im.transpose(Image.ROTATE_180)
out = im.transpose(Image.ROTATE_270)

There’s no difference in performance or result between transpose(ROTATE) and corresponding


rotate operations.
A more general form of image transformations can be carried out via the transform method. See
the reference section for details.

Colour Transforms
The Python Imaging Library allows you to convert images between different pixel
representations using the convert function.
Converting between modes

im = Image.open("lena.ppm").convert("L")

The library supports transformations between each supported mode and the “L” and “RGB”
modes. To convert between other modes, you may have to use an intermediate image (typically
an “RGB” image).

Image Enhancement
The Python Imaging Library provides a number of methods and modules that can be used to
enhance images.
Filters
The ImageFilter module contains a number of pre-defined enhancement filters that can be used
with the filter method.
Applying filters

import ImageFilter
out = im.filter(ImageFilter.DETAIL)

Point Operations
The point method can be used to translate the pixel values of an image (e.g. image contrast
manipulation). In most cases, a function object expecting one argument can be passed to the this
method. Each pixel is processed according to that function:
Applying point transforms

# multiply each pixel by 1.2


out = im.point(lambda i: i * 1.2)

Using the above technique, you can quickly apply any simple expression to an image. You can
also combine the point and paste methods to selectively modify an image:
Processing individual bands

# split the image into individual bands


source = im.split()

R, G, B = 0, 1, 2

# select regions where red is less than 100


mask = source[R].point(lambda i: i < 100 and 255)

# process the green band


out = source[G].point(lambda i: i * 0.7)

# paste the processed band back, but only where red was < 100
source[G].paste(out, None, mask)

# build a new multiband image


im = Image.merge(im.mode, source)
Note the syntax used to create the mask:

imout = im.point(lambda i: expression and 255)

Python only evaluates the portion of a logical expression as is necessary to determine the
outcome, and returns the last value examined as the result of the expression. So if the expression
above is false (0), Python does not look at the second operand, and thus returns 0. Otherwise, it
returns 255.

Enhancement
For more advanced image enhancement, you can use the classes in the ImageEnhance module.
Once created from an image, an enhancement object can be used to quickly try out different
settings.
You can adjust contrast, brightness, colour balance and sharpness in this way.
Enhancing images

import ImageEnhance

enh = ImageEnhance.Contrast(im)
enh.enhance(1.3).show("30% more contrast")

Image Sequences
The Python Imaging Library contains some basic support for image sequences (also called
animation formats). Supported sequence formats include FLI/FLC, GIF, and a few experimental
formats. TIFF files can also contain more than one frame.
When you open a sequence file, PIL automatically loads the first frame in the sequence. You can
use the seek and tell methods to move between different frames:
Reading sequences

import Image

im = Image.open("animation.gif")
im.seek(1) # skip to the second frame

try:
while 1:
im.seek(im.tell()+1)
# do something to im
except EOFError:
pass # end of sequence

As seen in this example, you’ll get an EOFError exception when the sequence ends.
Note that most drivers in the current version of the library only allow you to seek to the next
frame (as in the above example). To rewind the file, you may have to reopen it.
The following iterator class lets you to use the for-statement to loop over the sequence:
A sequence iterator class

class ImageSequence:
def __init__(self, im):
self.im = im
def __getitem__(self, ix):
try:
if ix:
self.im.seek(ix)
return self.im
except EOFError:
raise IndexError # end of sequence

for frame in ImageSequence(im):


# ...do something to frame...
Chapter 4: Core Program Source Code
//Importing key modules
import sys
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from pytesser import *
from PIL import Image
//UI Implementation
class filedialogdemo(QWidget):
def __init__(self, parent = None):
super(filedialogdemo, self).__init__(parent)

layout = QVBoxLayout()

self.le = QLabel("Optical Character Recognition - GNDIT Project")


self.le.setAlignment(Qt.AlignCenter)

# size of window
self.resize(250, 300)
layout.addStretch(1)
self.btn = QPushButton("Click here to select Picture")
self.btn.resize(self.btn.sizeHint())
#self.btn.move(550, 100)
self.btn.setFixedWidth(300)
self.btn.clicked.connect(self.getfile)
# center aligns the window
self.move(QApplication.desktop().screen().rect().center() - self.rect().center())

self.path = ''

self.btn1 = QPushButton("Show data")


self.btn1.setFixedWidth(300)
self.btn1.clicked.connect(lambda :self.ocr(self.path))

layout.addWidget(self.le)
layout.addWidget(self.btn)
layout.addWidget(self.btn1)

self.contents = QTextEdit()
self.contents.setMaximumHeight(600)
layout.addWidget(self.contents)
self.setLayout(layout)
self.setWindowTitle("Optical Character Recognition")
//File Picker Implementation
def getfile(self):
fname = QFileDialog.getOpenFileName(self, 'Open file',
'/home/kumaramit1996/Pictures/',"Images (*.png *.jpg *.gif)")
pixmap = QPixmap(fname)
pixmap = pixmap.scaledToWidth(300)
self.le.setPixmap(pixmap)
self.move(QApplication.desktop().screen().rect().center() - self.rect().center())
self.path = str(fname)
//File Picker Impplementati
def getfiles(self):
dlg = QFileDialog()
dlg.setFileMode(QFileDialog.AnyFile)
dlg.setFilter('Text files (*.txt)')
filenames = QStringList()

if dlg.exec_():
filenames = dlg.selectedFiles()
f = open(filenames[0], 'r')

with f:
data = f.read()
self.contents.setText(data)
//OCR Conversions
def ocr(self, path):
if path is not '':
im = Image.open(path)
text = image_to_string(im)
text = image_file_to_string(path)
text = image_file_to_string(path, graceful_errors=True)
self.contents.setText(text)
//Main implementation
def main():
app = QApplication(sys.argv)
ex = filedialogdemo()
ex.show()
sys.exit(app.exec_())
//Calling Main
if __name__ == '__main__':
main()
Chapter 5: Live Example

Fig-Source Image(Image who's text is to be fetched)


Fig-Beta Interface of pyOCR
Fig-Picking Source File
Fig-Beta Interface(before Conversions)
Fig-Beta Interface(converted text is selected)
Conclusions
6.1 Results

After successfully creating our English character and language models, we assessed the accuracy

of the pyOCR software. We were able to successfully recognize Wnglish special characters and

increase the overall accuracy. We used a character based approach to assess the accuracy and

increase the rate of correct recognition by 8%. The original accuracy with the English character

model was 66% on a sample of 1700 characters and we increased this to 74.5% with our

character model. We manually calculated the accuracy because the ground truth data for this text

did not exist in digital form. From our tests, we have concluded that our character and language

model yield significantly better results than the default English models.

6.2 Conclusions on pyOCR

The goal of pyOCR is to provide an accessible, flexible, and simple tool to preform optical

character recognition. In its current state, it is not the most user friendly utility and still has many

kinks to work out. This is all understandable because it is in an alpha stage of development, and

will require some more attention before an official release. The actual theory behind character

recognition is in place in the software. pyOCR does an amazing job preprocessing and

segmenting images and allows for many fine adjustments to fulfill a variety of user needs. It is

now just a matter of reorganizing and optimizing the code to create a user friendly experience.

With time, we believe pyOCR will be one of leading names in optical character recognition
software.

6.3 Future Work

As we expect to extend the current version of pyOCR for the Devanagiri Script we are getting

familiar with the types of challenges presented by accented characters and are trying to deal with

them successfully. We thus anticipate a future extension of pyOCR to most languages based on

the Asian to be simple based off the current version.

For languages with different alphabets like Chinese and Arabic we think it possible for a

future work project to adapt pyOCR to vertical and right to left character recognition since at the

language model level, we defined Unicode to be the standard used encoding. This is consistent

with the need to represent most written languages in unique encoding for further extensions to

other languages. The training portion will then be the key for both the correcting representation

and clustering of any new set of characters or alphabets.

As mentioned in section 2.3.2, the pyOCR software is run through multiple commands that

represent each step of the recognition process starting from the preprocessing and segmentation

and ending with the use of character and language models. We believe it will be very handy and

useful to streamline these commands under a single command. This can save a lot of time during

future revisions of the software as it is necessary for extensive testing to run it multiple times.

Such a command can take in flags for the different operations within the digitization pipeline,

and when omitted they will have default values for ease of use.
References

[1] Hyvärinen, Aapo, and Erkki Oja. "Algorithms and Applications." Independent Component

Analysis (2000): 1-31. Web. Jan.-Apr. 2012.

[2] Mori, Shunji, Ching Y. Suen, and Kazuhiko Yamamoto. Historical Review of OCR Research

and Development. Tech. no. 0018-9219. Vol. 80. IEEE, 1992. Print. Proceedings of the IEEE.

[3]Holley, Rose. "How Good Can It Get? Analysing and Improving OCR Accuracy in Large

Scale Historic Newspaper Digitisation Programs." D-Lib Magazine. Web. 28 Mar. 2012.

<http://www.dlib.org/dlib/march09/holley/03holley.html>.

[4] Breuel, Thomas M. The pyOCR Open Source OCR System. Tech. DFKI and U.

Kaiserslautern, Oct. 2007. Web. 5 Apr. 2012.

[5] Handel, Paul W. Statistical Machine. General Electric Company, assignee. Patent 1915993.

27 June 1933. Print.

[6] Smith, Ray. "Tesseract OCR Engine." Lecture. Google Code. Google Inc, 2007. Web. Mar.-

Apr. 2012. <http://tesseract-ocr.googlecode.com/files/TesseractOSCON.pdf>.

[7] Teh, Yee Whye, Simon Osindero, and Geoffrey E. Hinton. "Energy-Based Models for Sparse

Overcomplete Representations." Journal of Machine Learning Research 4, 03 Dec. 2003. Web.

[8] Mohri, Mehryar, Fernando Pereira, and Michael Riley. "Weighted Finite-State Transducers in

Speech Recognition." Publications of Mehryar Mohri. 2000. Web. 10 Apr. 2012.

<http://www.cs.nyu.edu/~mohri/pub/asr2000.ps>.

[9] "Finite State Automata." Strona Główna. Web. 10 Apr. 2012.

<http://www.eti.pg.gda.pl/katedry/kiw/pracownicy/Jan.Daciuk/personal/thesis/node12.html>.
Greenfield, Kara and Sarah Judd. “Open Source Natural Language Processing.” Worcester

Polytechnic Institute. Web. 28 Apr. 2010. <http://www.wpi.edu/Pubs/E-project/Available/E-

project-042810-055257/>.

You might also like