Professional Documents
Culture Documents
Genome analysis
ß The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org 1181
S.B.Mudunuri and H.A.Nagarajaram
1182
IMEx: Imperfect Microsatellite Extractor
3 IMPLEMENTATION
IMEx has been developed in standard C language and has
undergone extensive preliminary testing and comparison with
other existing tools yielding satisfactory results. The program
can read a sequence of any length as memory is dynamically
allocated. However, the size limit is subjected to the system
configuration. The program has been successfully tested on
Human X chromosome of the size 147MB, on a system
with Intel Xeon processor with 2GB RAM. A web server
has also been created and this can be accessed from http://
203.197.254.154/IMEX/ or http://www.cdfd.org.in/imex. The
web server has been developed using CGI-Perl. HTML forms
have been created for getting input sequences and parameters
used by the C program and display the results on the
browser. The stand-alone program can be downloaded from
the ‘downloads’ section of the web server homepage.
Input to the program consists of a sequence file and the
following parameters: (a) number of edit operations/motif (k);
(b) percentage imperfection for the entire tract ( p); (c)
minimum repeat number (n); (d) coding information file. The
web version offers three different modes of access to the
Fig. 3. Flowchart of IMEx algorithm. program: basic, intermediate and advanced. The basic mode
contains very few options to be set by the user. The basic mode
runs with default values, except for an option to select either
perfect or imperfect microsatellites. The default parameters of
removes redundancies. For example, ATGCCCATGCCC is IMEx are as follows: imperfection percentage (p) is 10% for all
identified as (ATGCCC)2 only and the internal repeat of C repeat sizes; imperfection limit/repeat unit (k) of each repeat
within the hexanucleotide motif is ignored. size: (Mono: 1, Di: 1, Tri: 1, Tetra: 2, Penta: 2, Hexa: 3) and the
While detecting the microsatellite tract as a tandem repeat minimum number of repeat units (n) is set to 2 for all repeat
of a motif, IMEx also simultaneously stores the edit operations sizes i.e. any repeat unit that is repeated at least twice is
(indels and substitutions). Pairwise alignment between reported. The intermediate mode offers few options where the
the identified tract and its perfect counter part is, nevertheless, user can adjust the p value for all repeat tracts, k value for each
produced to indicate the matches, mismatches and gaps. repeat unit size and other options. Advanced mode offers all
A sample alignment produced by IMEx is shown in the the options available for this program and can adjust all the
Figure 4. Along with the alignments, the details of the available parameters. The advanced mode can set the flanking
repeat tract such as consensus (repeating unit), number of sequences’ size limit, switch to generate text outputs, search for
iterations, tract length, imperfection percentage, nucleotide a particular pattern, etc. The interface has been designed for the
composition and coding region (if it is in the coding region) convenience of the users. Using IMEx, the user can also search
or flanking coding regions (if it is in the non-coding for a particular pattern (such as, CAG repeats) or can search
region) are written on a file in the form of a table. IMEx for a particular size (di or tetra) repeats or can search only
uses.ptt file (NCBI’s protein table file) for protein-coding perfect repeats or a combination of perfect and imperfect
region information. repeats.
1183
S.B.Mudunuri and H.A.Nagarajaram
The program generates two files, one of which gives a Min Score: 2) which yielded substantial number of micro-
summary table describing the microsatellite tracts along with satellites. This is because the length of microsatellite detected by
their information that includes tract size, number of iterations, TRF is dependent on the value of Min Score. For sputnik also,
percentage imperfection, nucleotide composition and coding/ we used the least stringent parameters (Match: þ1, Mismatch:
non-coding information. The second file contains the alignment 3, Min Score: 5). For IMEx, we set the ‘p’ value of all tracts
of each repeat with its consensus sequence. These two files are to 10%; ‘k’ value for each pattern size: Mono: 1, Di: 1, Tri: 1,
produced both in HTML form as well as in text formats. The Tetra: 2, Penta: 2, Hexa: 3 and further restricted to report only
text files produced can be downloaded and used for further those microsatellites with minimum repeat copy number
studies. In HTML outputs, the files are linked so that on (Mono:5, Di: 3, Tri: 2, Tetra: 2, Penta: 2, Hexa: 2) to match
clicking a repeat will display its corresponding alignment in a those reported by TFR and Sputnik. TRF and Sputnik
separate HTML page. A link has also been provided to know identified 50 and 19 repeats respectively, whereas IMEx
the function of the coding region near which microsatellite is identified 146 microsatellite tracts (Table 1). In fact, IMEx
1184
IMEx: Imperfect Microsatellite Extractor
(Continued)
1185
S.B.Mudunuri and H.A.Nagarajaram
Table 1. Continued Table 2. Comparison of execution times (in seconds) of TRF, Sputnik
and IMEx. The programs were run on an Intel Xeon Dual Processor
3.2 GHz Linux server
Locus in bp Microsatellite tract Motif
ACKNOWLEDGEMENTS
The authors would like to thank Mr Pankaj Kumar,
perfect as well as imperfect microsatellites; (ii) get the
Mr Mohammad Anwaruddin, Dr V.B.Sreenu and
coding/non-coding information of the microsatellite tracts;
Mr Suprabhat Reddy for their valuable suggestions and
(iii) generate alignments with their perfect counter parts
assistance. A grant from the Department of Biotechnology
to know about substitutions and indels; (iv) restrict the
(DBT), India is gratefully acknowledged. The authors also
imperfection limit for repeat unit of each size; (v) set the
thank the anonymous referees for their critical and constructive
imperfection percentage threshold of the entire tract of
comments.
each repeat size; (vi) restrict the minimum number of repeat
units of a tract of each size; (vii) search for repeats of a Conflict of Interest: none declared.
1186
IMEx: Imperfect Microsatellite Extractor
1187