You are on page 1of 12

Vol. 20 no.

5 2004, pages 758–769


BIOINFORMATICS DOI: 10.1093/bioinformatics/btg482

STAM: simple Transmembrane Alignment Method


Yinon Shafrir∗ and H. Robert Guy
National Cancer Institute, LECB, MSC 5677, 12 South Drive, Bethesda,
MD 20892-5677, USA

Received on August 5, 2003; revised on October 2, 2003; accepted an October 14, 2003
Advance Access publication January 29, 2004

ABSTRACT the cell, TMPs’ comprise a much smaller fraction of known


Motivation: The database of transmembrane protein (TMP) protein structures in the PDB database (Arai et al., 2003; Boyd
structures is still very small. At the same time, more and more et al., 1998).
TMP sequences are being determined. Molecular modeling is The first step towards a successful homology model is
an interim answer that may bridge the gap between the two aligning the target proteins sequence with a sequence of
databases. a protein with a known structure. An ‘optimal’ alignment
The first step in homology modeling is to achieve a good maximizes a ‘matching’ function between the sequences.
alignment between the target sequences and the template There are many approaches to solve this fundamental problem
structure. However, since most algorithms to obtain the in bioinformatics, for a basic review see (Mount, 2001).
alignments were constructed with data derived from globular Most methods treat the protein’s sequence as a single entity
proteins, they perform poorly when applied to TMPs. and manipulate it as such. A set of alignment parameters
In our application, we automate the alignment procedure and is applied to the entire sequence without any weight given
design it specifically for TMP. We first identify segments likely to different regions. While this approach is successful when
to form transmembrane α-helices. We then apply different applied to water-soluble proteins, the situation is different
sets of criteria for transmembrane and non-transmembrane for membrane proteins. TMPs reside in two different physio-
segments. For example, the penalty for insertion/deletions in chemical environments simultaneously. As a consequence,
the transmembrane segments is much higher than that of a most membrane-spanning segment has a regular secondary
penalty in the loop region. Different substitution matrices are structure [simple transmembrane alignment method (STAM)
used since the frequencies of occurrence of the various amino deals only with transmembrane α-helices], while the extra-
acids differ for transmembrane segments and water-soluble cellular and intracellular regions are more likely to possess
domains. flexible irregular loops. Furthermore, the transmembrane seg-
Results: This program leads to better models since it does not ments are composed primarily of hydrophobic residues, in
treat the protein as a single entity with the same properties, stark contrast to that of the connecting loops, which are gen-
but accounts for the different physical properties of the various erally much more hydrophilic. These two major differences
segments. STAM is the first multisequence alignment program have a significant impact on the choice of parameters for multi-
that is directly targeted at transmembrane proteins. sequence alignment process performed on TMP sequences.
Availability: Source code and installation package are avail- The two main parameters in multi-sequence alignment are
able on request from the authors. Web access is currently gap penalties associated with opening gaps in the sequence
implemented. (and hence in the putative structure), and a substitution mat-
Contact: yinon@nih.gov rix that is related to the probabilities of residue type mutating
to another residue type.
INTRODUCTION The secondary structure restriction implies that introduction
Homology modeling has become a central tool in investigating of insertions/deletions in the transmembrane segments should
proteins structure. In many cases when protein structure is be rare (such as when a single proline kinks the helix), while
not available through standard experimental techniques (such gaps could be applied more liberally in the loop regions that
as X-ray crystallography), homology modeling may provide lacks any regular secondary structure.
significant information regarding the protein’s structure and Substitution matrices reflect the tendency of different
function. The need for computational modeling becomes more residue types to mutate. Residues exposed on protein sur-
acute when one considers transmembrane proteins (TMPs). faces mutate more than those buried within the protein
While they may account for up to one-third of the proteins in (Komiya et al., 1988; Overington et al., 1992). Residues of
transmembrane segments that are exposed to lipids tend to
∗ To whom correspondence should be addressed. be very hydrophobic, whereas surface residues of soluble

758 Published by Oxford University Press

Downloaded from https://academic.oup.com/bioinformatics/article-abstract/20/5/758/214103


by guest
on 03 May 2018
STAM: simple Transmembrane Alignment Method

proteins are exposed to water and tend to be hydrophilic. adaptive systems. This technique relies on pre-constructed
These propensity and conservation differences suggest that profiles of transmembrane segments and comparing the target
substitution matrices that were constructed using water- sequences against these profiles. The rate of successful identi-
soluble proteins alignments as reference data, are inappropri- fication is slightly higher in all these methods and depends on
ate for the analysis of TMPs. Various groups have developed the quality of the profiles alignment (Chen et al., 2002; Kall
substitution matrices specifically tailored to TMP, and indeed and Sonnhammer, 2002).
they differ significantly from the standard matrices (Jones While in theory the concept of parameters dependency
et al., 1994; Ng et al., 2000). on topology is not new, and applied manually (and sub-
Alignment of sequences within the same family is usually a jectively) for many years, STAM is the first application to
straightforward process. The sequences’ identity in such cases combine secondary structure prediction (not known structure)
is high enough to achieve a meaningful alignment without with alignment parameters variability, in order to address the
substantial effects due to the choice of substitution matrix or special problem of multi-sequence alignment of transmem-
insertion/deletion penalties. However, when the sequences set brane proteins. In this sense STAM is different from global
include members that belong to the same superfamily but not alignment programs, in that it breaks the sequences into seg-
necessarily to the same family, the homology identity can fall ments. It is also different from local alignment algorithms by
below the 30% threshold, and the importance of the alignment maintaining the structural integrity of the transmembrane seg-
parameters in obtaining a correct alignment becomes more ments without much emphasis on local similarities. It differs
evident. For example, two ion-channel proteins can belong from both classes of programs by the applications of different
to two distinct families with little or no obvious sequence alignment parameters at different sections of the sequence.
identity; yet share the same folding topology. In this situ-
ation, standard alignment programs encounter problems, and
in many cases fail to align functionally important domains. ALGORITHM
Jones and colleagues already wrote in 1994: ‘Considering The basic structure of the algorithm is described in the flow
the differences in the mutability patterns observed between chart (Fig. 1). The first step is to predict the general topo-
a lipid and non-lipid environment, clearly when trying to logy of each sequence by distinguishing between two types of
align distantly related transmembrane segments it is vital to segments: transmembrane and non-transmembrane. This sort-
bear these differences in mind. Alignment programs that use ing is achieved by applying improved hydrophobicity analysis
the transmembrane matrix for transmembrane regions (either (von Heijne, 1992) to the sequences.
experimentally determined or predicted) and a general muta- Different scales can be used (e.g., Guy, 1985 or Kyte and
tion data matrix for the polar flanking regions are likely to Doolittle, 1982), as well as sliding windows sizes. Standard
perform much better than programs that use a single matrix’. hydropathy analysis with a single window has been shown to
(Jones et al., 1994). be unreliable (Fariselli et al., 2003). STAM uses two sliding
In addition, the question of variable penalty gaps as a windows (the trapezoidal method) and a sensitive peak detec-
function of the secondary structures is an old one. In 1989, tion method that allows it to identify segments even with
Henneke published an algorithm that takes into account the low hydrophobic signature. From our experience, using this
known secondary structure of a sequence and adjust the gap improved method, the success rate of correctly identifying the
penalty accordingly. In 1992, Smith and Smith published the true type of the segment seems to be relatively insensitive to
pattern-induced multi-sequence alignment (PIMA) algorithm the specific scale used as long as it is a realistic one.
and showed that ‘secondary structure information can signific- The hydrophobicity-based method is enhanced further by
antly improve the accuracy of aligning structure boundaries helicity propensity analysis that tries to minimize the number
in a set of homologous sequences even when the structure of false positives (over-prediction of transmembrane seg-
of only one member of the family is known’. However, as ments) or false negatives (under-prediction). For example,
was mentioned before, only a handful of TMPs have known many hydropathy-based programs often fail to identify the
structure (Smith and Smith, 1992). S4 segment in voltage gated 6TM ion channels. The added
At the same time, many methods of identifying the pos- helicity analysis helps STAM to identify it in most cases. This
ition of the transmembrane segments in the sequence have propensity scale is described in the work by Persson and Argos
been developed. The simplest ones rely on scales that indic- (1994).
ate the relative hydrophobicity of the different amino acids. STAM’s goal is not to predict accurately the transmem-
The basic assumption is that long stretches of hydrophobic brane topology of all the sequences in the sequences set,
residues in a TMP correspond with the actual transmembrane but rather to achieve a biophysical meaningful alignment.
segments. Many scales have been suggested based on different As long as a sizable number of the sequences have the
approaches, with similar success in predicting the location of same topology and are cataloged as such, the resulting align-
the transmembrane parts. Later on, the prediction algorithms ment will not exhibit dependence on the individual sequences
shifted towards approaches based on neural networks and that comprise the profile. Thus, the method of establishing

759

Downloaded from https://academic.oup.com/bioinformatics/article-abstract/20/5/758/214103


by guest
on 03 May 2018
Y.Shafrir and H.R.Guy

Fig. 2. The identification and extraction of the putative helical core


from the alignment. The ‘chopped’ segments will be realigned with
the ‘chopped’ segments from the neighboring helices as loops.

that belong to the distribution bin that has most members will
constitute a basis for the overall alignment. The rest of the
sequences are stored for future alignment.
The segments are then aligned in sequential order. All the
corresponding transmembrane segments are aligned together
under very restrictive conditions. The gap-opening penalty is
set to a very high value and the substitution matrix is set to
reflect the different substitution probabilities in a transmem-
Fig. 1. The flowchart of STAM’s algorithm. See text for more details.
brane region. From this alignment, a transmembrane core is
extracted. This core is the block of columns that have no gaps
the transmembrane topology could be any of the common at any positions. The ‘fringe residues’ are then chopped and
algorithms available today or will be available in the future. assigned to the surrounding loop regions. This ensures that
In addition to this automated prediction and assignment step, errors in identifying the edges of the transmembrane seg-
we gave the user the option to impose his/her own prediction ments may have a second opportunity to be aligned correctly
before the alignment step using a combination of methods and (example at Fig. 2). STAM also detects segments that diverge
via a ‘majority consensus vote’ obtain better confidence in the significantly from the block form (such as to make the core
prediction (Nilsson et al., 2002) or preferably experimentally region smaller than preset number of residues) and removes
obtained data. them from the alignment on the assumption that they are false
STAM should not be used as a general alignment pro- positives.
gram. Water-soluble proteins that have core hydrophobic Rather then reinvent the wheel, we used ClustalW algorithm
regions will be misidentified and cause major alignment (Thompson et al., 1994) to align the segments, though the user
errors. For water-soluble proteins, the standard alignment can easily modify the source code to use a different alignment
methods are adequate, and should be used. When TMPs method. We chose ClustalW, since the source code is freely
are aligned (with STAM or any other alignment program), available and allows the user to modify ClustalW parameters
it is practical to separate the transmembrane region from the (such as extension gap penalty) and its integration into STAM
large soluble domains. This is done for two main reasons: was fairly easy. However, studies have shown that the differ-
(1) soluble domains are often highly variable (among different ences between various alignment programs are comparatively
families) and therefore their alignment is meaningless from a small (Elofsson, 2002; Thompson et al., 1999).
modeling point of view. (2) Large soluble domains typically Once all the transmembrane segments are aligned, the modi-
have properties of water-soluble proteins, thus making them fied loop segments are analyzed. The process is identical to the
poor candidates for alignment by a program that was designed alignment process of the transmembrane segments. This time
specifically for transmembrane regions. however, the alignment parameters are more lax. The penalty
Once the hydrophobic segments have been resolved, an for opening a gap is substantially lower and the substitution
additional process of ‘segments’ conditioning’ is preformed. matrices are the standard PAM or BLOSUM.
This process takes into account the propensity of the residues With this, the major part of the alignment process is
at the edges of the segments to be part of a transmembrane complete. The aligned segments are recombined to aligned
helix (Persson and Argos, 1994). sequences to compose a profile. The remainder of the
STAM then sorts the predicted topologies into distribution sequences that did not participate in the previous alignment
bins. In the best case, if all sequences share the same topo- procedure (due to different identified topology) is now aligned
logy, the distribution will include only one bin. The sequences sequentially to the profile. By imposing a secondary structure

760

Downloaded from https://academic.oup.com/bioinformatics/article-abstract/20/5/758/214103


by guest
on 03 May 2018
STAM: simple Transmembrane Alignment Method

mask on the profile, we take advantage of this feature in Correct positioning of the helices
many profile-sequence alignment programs, such as ClustalW We compared STAM alignment of eukaryotic (bovine)
(Thompson et al., 1994) or Hmmer (Durbin et al., 1998; Eddy, rhodopsin (Palczewski et al., 2000), and three bac-
1995). terial rhodopsins: bacteriorhodopsin (Belrhali et al., 1999),
STAM is designed to reduce the impact of topology pre- halorhodopsin (Kolbe et al., 2000) and sensory rhodopsin II
diction errors on the final alignment. There are two common (Royant et al., 2001).
prediction errors. False positives, which is the assignment Even though, the structures of the prokaryotic and euka-
of transmembrane status to a loop or the breakage of a long ryotic families are distant from one another they share many
segment into two shorter ones, and false negatives, which is structural characteristics (Nero and Louis, 1992). Both famil-
the assignment of loop status to a transmembrane segment. ies have a seven transmembrane helix motif, but the bacteri-
In the vast majority of cases this will lead the sequence to orhodopsin helices tend to be shorter and more perpendicular
be cataloged in a different distribution bin than the major- to the membrane.
ity of the sequences (and therefore will not be part of the As we can see in Figure 3A, STAM identified correctly all
core profile) unless all the sequences share the same char- the transmembrane segments and aligned them without gaps
acteristics. If they do, the consistent nature of the prediction in their helical cores. Figure 3B is the ClustalW alignment
algorithm will make the same prediction mistake in all of them of the same sequences. While ClustalW produced essentially
and the alignment will not be much effected. For that reason, the same alignment for the three bacterial sequences, it failed
the consistency of the topology analysis is key to reduce to position the helices between the families correctly. The
the effect misprediction will have on the overall alignment. first three helices of the bovine rhodopsin are completely mis-
Still, unidentified false positives disrupt the final alignment aligned with the helices of the three bacterial rhodopsins. It
more than false negatives, since regions identified as loops is important to note that STAM identified the topology of all
are aligned again after the putative transmembrane helices. the sequences correctly using the KD hydrophobicity scale,
Consequently, STAM will choose a consensus topology with and failed to do so using Guy’s scale. Yet, even with the
the least number of segments between two bins with the misidentification of the topology, STAM managed to align
same number of sequences. The last defense against predic- the sequences correctly because two sequences were enough
tions errors is the removal from the core-alignment segments to build a meaningful secondary-structure mask to include in
that cause the core helical region to be smaller than a pre- the profile/sequence alignment procedure.
set number of residues. These misalignments are often the
result of a prediction mistake along the sequence. However, Insertions/deletions in known helices
even with these measures, prediction mistakes can effect the Crystal structure have been determined for the pore forming
final alignment. Hence, the user is encouraged to inspect the domain of four distantly related bacterial K + channels: KcsA
prediction results and correct them manually if an error is (Doyle et al., 1998), MthK (Jiang et al., 2002), KirBac1.1
detected. (Kuo et al., 2003) and KvAP (Jiang et al., 2003). Figure 4
STAM was implemented in Microsoft Visual Basic compares the alignments obtained using STAM to ClustalW of
version 6 and currently is available by request from the author these proteins and the three closest homologs to each that have
for Windows 2000/XP-based machines (and will be available <50% identity to each other, as was determined by a BLAST
on our future website). We currently are working to port the search (Altschul et al., 1990) of the non-redundant database.
algorithm to more environments. Due to their low level of sequence conservation, the more
peripheral M1 helices (of the 2-TM channels) and S5 helix of
the KvAP, are difficult to align. ClustalW (Fig. 4A) introduces
RESULTS gaps in the known helices in the crystallized structures, and
Comparing the accuracy of alignments is a difficult task even in the putative helical regions of some of the other sequences.
when the sequences involved are of globular proteins. A Based on the crystal structures, both programs misaligned M1
comparison is usually made between the proteins’ sequences of the KirBac family relative to the other families. However,
alignments and the proteins’ three-dimensional (3D) struc- while STAM places the M1 of the KirBac against the M1 of
tural alignments. This approach is not adequate when TMPs the other families, ClustalW places it mostly against insertions
are concerned. The number of known structures of TMP is in the other sequences. If one was to model KirBac on the
very small, and the subset of known TMP structures that are backbone of any of the other families, using STAM will retain
homologs is almost empty. the secondary structure of the sequence while using ClustalW
However, we first examined our method versus ClustalW will result in a disorganized coil spanning the lipid bilayer.
by aligning sequences from families for which at least one
structure has been determined. We looked at three different Sensitivity to motifs in distantly related sequences
major improvements that STAM introduces to the alignment The final example is the alignment of a small number of dis-
of TMPs. tantly related 6TM ion channels. ClustalW perform relatively

761

Downloaded from https://academic.oup.com/bioinformatics/article-abstract/20/5/758/214103


by guest
on 03 May 2018
Y.Shafrir and H.R.Guy

Fig. 3. (A) STAM alignment of rhodopsins. (B) ClustalW alignment of the same sequences. The helices’ locations in the crystal structures
are denoted by a black underline. The structures codes in the PDB database are as follows: bacteriorhodopsin—1QHJ, halorhodopsin—1E12,
sensory rhodopsin II—1H68 and bovine rhodopsin—1F88.

well when the number of sequences is large and the proteins does not have the (RXX)n signature]. In addition, some of
are somewhat closely related. Since ClustalW obtains its final the other helices are misaligned as well (especially S5). In
alignment by building a phylogenic guiding tree and clustering this example, STAM avoids these mistakes. It aligns well the
the sequences accordingly, the quality of the final alignment S4 region (7/8 ratio in three positions and 4/8 in one other
depends on the amount of input sequences their relative evol- position). The gaps are missing in the putative transmembrane
utionary distances from one another. When the number of regions, and the gaps concentrate at the highly variable loop
sequences is small and the sequences are very distant, the regions or at the edges of the helices.
quality of the alignment deteriorates. In this case, we aligned
small number of 6TM ion channels that are very distant from
one another. DISCUSSION
Eight different 6TM sequences were aligned: four K+ Chan- When aligning sequences of TMP, there are added difficulties
nels, two Ca2+ channels and two Na+ channels. KvAP and to the ones that are already associated with multi-sequence
Shaker were introduced to the alignment since KvAP is the alignment algorithms. Instead of developing a new alignment
only known structure of a 6TM channel and Shaker is one of algorithm, we constructed an algorithm that uses contem-
the most-studied channels. The results are shown in Figure 5. porary alignment procedures, and applies certain criteria to
In this alignment, ClustalW failed to identify the signature treat the special cases of TMP. It has been shown already that
repetitive motif of arginines at every third position in the most available alignment programs today achieve about the
S4 helix of the voltage-sensitive channels, which is respons- same rate of success regardless of the underlying algorithms
ible for the charge movement through the membrane during (Thompson et al., 1999). The basic concept behind this
gating. The highest relative amount of arginine or lysine in project was not to reinvent the wheel when it comes to multi-
an alignment position is 3/8 [the Escherichia coli segment sequence alignments, but rather use the available tools and

762

Downloaded from https://academic.oup.com/bioinformatics/article-abstract/20/5/758/214103


by guest
on 03 May 2018
STAM: simple Transmembrane Alignment Method

Fig. 3. Continued.

techniques and apply the restrictions arising from the unique between the helices have no regular secondary structure and
physio-chemistry of TMP. long water-exposed segments can be aligned with standard
We chose not to base our program on Hidden-Markov- unconditioned methods. As one can see, these are very basic
Models (HMM) approach because HMMs necessitate an assumptions that cover most of the TMP cases (but not
initial seed alignment on which the final alignment will be all). Once the hydrophobic regions identification process is
constructed. This requires the user to provide a good initial complete, the segments are aligned according to their classi-
alignment for the TMP’s in question. If the seed alignment is fication. Transmembrane segments are aligned with high gap
unambiguous, no complications arise. Things become more penalties to prevent distortions in the secondary structure and
problematic when the sequences in question have little overlap with a substitution matrix that represents the substitution prob-
and highly variable. Then the user has to inject his subjective abilities in a transmembrane segment. However, we must note
criteria and this in turn will propagate throughout the final that the gap penalty value is more important to the alignments
alignment. then is the selected matrix. This may reflect the situation today
We prefer to use the more conventional approach of of which only a handful of substitution matrices, tailored spe-
matrix-based alignments and to use standard criteria that are cifically for TMPs, exist. To compound this problem, most
general to TMPs and based on empirical results (such as the of these matrices have close entropy values due to their being
hydrophobic scales). developed from closely related proteins. They therefore rep-
Yet HMMs are extremely fast and powerful tools that need resent only a narrow band in the evolution process of TMPs.
a good initial alignment. STAM may be just the objective tool STAM can be the tool needed to align more distant sequences
to provide these initial alignments. in order to obtain more distant matrices.
As its name implies, STAM, the approach is very Even this naive approach has shown improvement over
simplistic in nature and not perfect. The method is based standard alignment programs. We chose to compare its results
on the following assumptions: (1) long hydrophobic seg- versus these of ClustalW for two reasons: (1) ClustalW is
ments form transmembrane α-helices and (2) short segments one of the most successful and popular alignment programs.

763

Downloaded from https://academic.oup.com/bioinformatics/article-abstract/20/5/758/214103


by guest
on 03 May 2018
Y.Shafrir and H.R.Guy

(2) We used ClustalW algorithm as the alignment engine and, Nonetheless, the method that is the basis of STAM strives
therefore, it is reasonable to observe the changes STAM brings to minimize the effects of possible mistakes. Since the
to a ClustalW alignment. However, STAM is a very modular segments are aligned individually, there is no drift of mis-
program and replacing the alignment engine with the user’s alignment errors to the neighboring segments. A major mis-
favorite alignment program is a straightforward procedure. alignment error in transmembrane proteins is the existence of
For example, we found that at the last phase of the alignment, a gap in the transmembrane regions. ClustalW tends to prefer
when the problem type becomes a sequence-profile align- to extend gaps. If one gap is opened, the cost of opening
ment, SAM (Karplus et al., 1998), which is a HMM-based an adjacent gap drops significantly. Though single gaps can
program, performs better than ClustalW in some cases. occur in the transmembrane segments, longer gaps are rare.
STAM does not guarantee a ‘correct’ alignment and some- STAM assigns the same penalty to the extension gap as to the
times it exhibits sensitivity to different input parameters. initial one. This discourages the appearance of long stretches
STAM is very flexible in this regard. It can either work in full of gaps in a transmembrane segment. Because of this, most
automatic mode (in which no input from the user is required), of the misalignments will appear at the ends of the helices
or manual mode (in which the different parameters and topo- and not in the core. This is due to errors in the identification
logy predictions can be altered by users). Users are advised of the helices edges. Since proteins tend to lose their ordered
to consider alternative alignments obtained with different secondary structure near the membrane boundaries, this is an
inputs. acceptable error.

Fig. 4. ClustalW alignment (A) and STAM alignment (B) of the pore-forming domain of different K + channels with solved structure. The
first helices in the known structures are underlined. ClustalW open gaps in many sequences at the first transmembrane helix (S5 in the
KvAP, M1 in all others), making a homology modeling based on its alignment a difficult task. In comparison, STAM maintains the structural
integrity of the helices. The NCBI accession numbers and PDB codes for sequences used in this alignment: 1ORQ (KvAP), gi:15024246
(KvAP-Like1), gi:16411529 (KvAP-Like2), gi:24350062 (KvAP-Like3), 1BL8 (KcsA), gi:27364075 (KcsA-Like1), gi:9657586 (KcsA-
Like2), gi:22406479 (KcsA-Like3), 1LNQ (MthK), gi:1498918 (MthK-Like1), gi:23129256 (MthK-Like2), gi:2648884 (MthK-Like3),
1P7B (KirBac), gi:17130300 (KirBac-Like1), gi:23042578 (KirBac-Like2) and gi:23015860 (KirBac-Like 3).

764

Downloaded from https://academic.oup.com/bioinformatics/article-abstract/20/5/758/214103


by guest
on 03 May 2018
STAM: simple Transmembrane Alignment Method

Fig. 4. Continued.

Another aspect that STAM brings to the forefront is the fold. The pore lining helices (S5-P-S6) of K+ channels exhibit
importance and relevancy of the alignment score. Most pro- high degree of conservation since these segments control
grams treat the alignment score as a function that has to be the channel selectivity and ions permeation. Comparing dif-
maximized. Once a maximal value has been attained, the pro- ferent potassium channels based on the pore lining helices
gram declares the corresponding alignment as ‘optimal’. This point unambiguously to a common ancestor. Nevertheless,
approach is somewhat misleading. Since the program does not this domain is much less conserved among channels with
know ‘a-priori’ the secondary structure of the sequence, it can- different ion selectivity, e.g., the pore-forming domain of
not take it into account when maximizing the score function. voltage-gated Ca2+ and K+ channels are quite dissimilar. On
This can lead to major misalignments of helices and breaking the other hand, the composition of the preceding four helices
up the secondary structure. STAM does not try to maximize (S1–S4) depends on the specific gating mechanism of the
the score over the entire sequences but forces the helices cores channels. Thus, this domain is fairly similar for voltage-gated
to be aligned first. The alignments may not look as ‘nice’ and K+ , Ca2+ and Na+ channels but can be very divergent among
be less in a ‘block’ form but they will retain the biophysical K+ channels that are gated differently (Fig. 5).
properties of the sequences. STAM solves this dilemma by generating a new phylogenic
ClustalW approach of building a phylogenic tree enhances tree for each segment’s alignment. This ensures that more
the accuracy of the alignments by aligning the closest similar segments will be aligned first (Fig. 6). In the example
sequences first before progressing to more distant ones. This above, if a 6TM voltage-gated K+ channel is aligned with
is done by pairwise alignment of all the sequences and meas- a 6TM voltage-gated Ca2+ channel and a 6TM Ca-activated
uring the percentage of matched positions. This method is K+ channel, the S4 of the voltage-gated K+ channel will be
valid if all the segments in a sequence share a common first aligned with the S4 of the voltage-gated Ca2+ channel.
evolutionary pathway. However, in TMPs in general, and The order changes at the S5-P-S6 domain; these segments of
ion channel proteins in particular, this assumption in not the K+ channels are aligned with each other first, and only
necessarily valid. Lets take for example the 6TM ion channel than are aligned with the Ca2+ channel and the rest of the

765

Downloaded from https://academic.oup.com/bioinformatics/article-abstract/20/5/758/214103


by guest
on 03 May 2018
Y.Shafrir and H.R.Guy

Fig. 5. ClustalW alignment of distantly related 6TM ion channels (A) and STAM’s alignment of the same sequences (B). The helices’
locations in the known structure (KvAP) are underlined. The ovals denote the wide gaps ClustalW introduces in the middle of S2 and S5.
The black rectangles denote the location of S4 of the voltage sensor. ClustalW fails to identify the repetitive motif of charged residues in
the voltage sensor. NCBI accession numbers for the sequences used: gi:30749953 (KvAP), gi:157063 (Shaker), gi:103086 (Ca-activated K),
gi:1787503 (E.coli K), gi:4633669 (T-Type Ca), gi:6751839 (Ascidian Ca), gi:1771576 (Rat Na) and gi:749324 (Yeast Na).

766

Downloaded from https://academic.oup.com/bioinformatics/article-abstract/20/5/758/214103


by guest
on 03 May 2018
STAM: simple Transmembrane Alignment Method

Fig. 5. Continued.

767

Downloaded from https://academic.oup.com/bioinformatics/article-abstract/20/5/758/214103


by guest
on 03 May 2018
Y.Shafrir and H.R.Guy

Fig. 6. The phylogenic trees that guide the alignments in Figure 5. (A) The phylogenic tree for the ClustalW alignment. This tree is applied
to the entire sequence. (B) The phylogenic tree of the S4 segment in the STAM alignment. The channels with the repetitive arginines are
grouped together. The channels with almost no such motif (The Ca-activated K + and the E.coli K + channels) are also grouped separately.
(C) The phylogenic tree for the P segment in the STAM alignment of Figure 5B. Since the P segment function as the selectivity filter, the
channels are sorted according to their selectivity. Note: in the STAM alignment, KvAP was aligned to the profile of the rest of the sequences
and therefore does not appear in the phylogenic tree.

sequences. The effects of this were demonstrated in Figure 5, REFERENCES


where ClustalW failed to align the arginines in the various Altschul,S.F., Gish,W., Miller,W., Myers,E.W. and Lipman,D.J.
voltage-gated channels, whereas STAM alignment displayed (1990) Basic local alignment search tool. J. Mol. Biol., 215,
the strong signature of the S4 in the voltage sensor. 403–410.
Alignment speed is also an important factor. Due to the Arai,M., Ikeda,M. and Shimizu,T. (2003) Comprehensive analysis
fact that STAM aligns only fragments rather than complete of transmembrane topologies in prokaryotic genomes. Gene, 304,
sequences the total time required for alignment is reduced. 77–86.
However, a compensating factor is the need to generate a Belrhali,H., Nollert,P., Royant,A., Menzel,C., Rosenbusch,J.P.,
guiding tree for each segment, which is done as described Landau,E.M. and Pebay-Peyroula,E. (1999) Protein, lipid and
water organization in bacteriorhodopsin crystals: a molecular
before by ways of pair-wise aligning the entire collection of
view of the purple membrane at 1.9 Å resolution. Structure Fold
segments. This is not an inherent requirement of STAM but Des., 7, 909–917.
rather a byproduct of using ClustalW code as the alignment Boyd,D., Schierle,C. and Beckwith,J. (1998) How many membrane
engine. Utilizing a different and faster alignment engine such proteins are there? Protein Sci., 7, 201–205.
as DIALIGN (Morgenstern et al., 1998) may yield improved Chen,C.P., Kernytsky,A. and Rost,B. (2002) Transmembrane helix
alignment times, but the alignment quality may suffer. In this predictions revisited. Protein Sci., 11, 2774–2791.
constant battle between alignments speed to alignments accur- Doyle,D.A., Morais,C.J., Pfuetzner,R.A., Kuo,A., Gulbis,J.M.,
acy, we tend towards the cautionary side and prefer the more Cohen,S.L., Chait,B.T. and MacKinnon,R. (1998) The structure
robust algorithm. of the potassium channel: molecular basis of K + conduction and
As a tool, STAM can operate automatically with a set of pre- selectivity. Science, 280, 69–77.
determined parameters, and accomplish the alignment without Durbin,R., Eddy,S., Krogh,A. and Mitchison,G. (1998) Biological
Sequence Analysis: Probabilistic Models of Proteins and Nucleic
any external input. On the other hand, STAM allows the user
Acids. Cambridge Press, Cambridge, UK.
to control every aspect of the alignment procedure and even
Eddy,S.R. (1995) Multiple alignment using hidden Markov models.
assign transmembrane segments manually. Hence, STAM Proc. Int. Conf. Intell. Syst. Mol. Biol., 3, 114–120.
provides a flexible yet efficient application for the analysis Elofsson,A. (2002) A study on protein sequence alignment quality.
of TMPs. Proteins, 46, 330–339.
To summarize, the approach described in this work present a Fariselli,P., Finelli,M., Marchignoli,D., Martelli,P.L., Rossi,I. and
new layer of improvements when aligning TMPs. The special Casadio,R. (2003) MaxSubSeq: an algorithm for segment-length
nature of these proteins is taken into account in a more com- optimization. The case study of the transmembrane spanning
prehensive way that results in more accurate alignments. In segments. Bioinformatics, 19, 500–505.
turn these alignments can provide better seed alignments for Guy,H.R. (1985) Amino-acid side-chain partition energies and
HMM-based alignment programs, provide statistical mean- distribution of residues in soluble-proteins. Biophys. J., 47,
61–70.
ingful information on distantly related sequences and most
Henneke,C.M. (1989) A multiple sequence alignment algorithm
importantly, it can yield more accurate and realistic TMP mod- for homologous proteins using secondary structure information
els that will serve the empirical community in designing more and optionally keying alignments to functionally important sites.
efficient experiments. Comput. Appl. Biosci., 5, 141–150.

768

Downloaded from https://academic.oup.com/bioinformatics/article-abstract/20/5/758/214103


by guest
on 03 May 2018
STAM: simple Transmembrane Alignment Method

Jiang,Y., Lee,A., Chen,J., Cadene,M., Chait,B.T. and MacKinnon,R. Ng,P.C., Henikoff,J.G. and Henikoff,S. (2000) PHAT: a
(2002) Crystal structure and mechanism of a calcium-gated transmembrane-specific substitution matrix. Bioinformatics, 16,
potassium channel. Nature, 417, 515–522. 760–766.
Jiang,Y., Lee,A., Chen,J., Ruta,V., Cadene,M., Chait,B.T. and Nilsson,J., Persson,B. and von Heijne,G. (2002) Prediction of partial
MacKinnon,R. (2003) X-ray structure of a voltage-dependent membrane protein topologies using a consensus approach. Protein
K + channel. Nature, 423, 33–41. Sci., 11, 2974–2980.
Jones,D.T., Taylor,W.R. and Thornton,J.M. (1994) A mutation data Overington,J., Donnelly,D., Johnson,M.S., Sali,A. and Blundell,T.L.
matrix for transmembrane proteins, FEBS Lett., 339, 269–275. (1992) Environment-specific amino-acid substitution tables—
Kall,L. and Sonnhammer,E.L. (2002) Reliability of transmembrane tertiary templates and prediction of protein folds. Protein Sci.,
predictions in whole-genome data, FEBS Lett., 532, 415–418. 1, 216–226.
Karplus,K., Barrett,C. and Hughey,R. (1998) Hidden Markov mod- Palczewski,K., Kumasaka,T., Hori,T., Behnke,C.A., Motoshima,H.,
els for detecting remote protein homologies. Bioinformatics, 14, Fox,B.A., Le,T.I, Teller,D.C., Okada,T., Stenkamp,R.E.,
846–856. Yamamoto,M. and Miyano,M. (2000) Crystal structure of
Kolbe,M., Besir,H., Essen,L.O. and Oesterhelt,D. (2000) Structure rhodopsin: a G protein-coupled receptor. Science, 289,
of the light-driven chloride pump halorhodopsin at 1.8 Å resolu- 739–745.
tion, Science, 288, 1390–1396. Persson,B. and Argos,P. (1994) Prediction of transmembrane seg-
Komiya,H., Yeates,T.O., Rees,D.C., Allen,J.P. and Feher,G. (1988) ments in proteins utilizing multiple sequence alignments. J. Mol.
Structure of the reaction center from Rhodobacter sphaeroides Biol., 237, 182–192.
R-26 and 2.4.1: symmetry relations and sequence comparis- Royant,A., Nollert,P., Edman,K., Neutze,R., Landau,E.M., Pebay-
ons between different species. Proc. Natl Acad. Sci. USA, 85, Peyroula,E. and Navarro,J. (2001) X-ray structure of sensory
9012–9016. rhodopsin II at 2.1-Å resolution. Proc. Natl Acad. Sci USA, 98,
Kuo,A., Gulbis,J.M., Antcliff,J.F., Rahman,T., Lowe,E.D., 10131–10136.
Zimmer,J., Cuthbertson,J., Ashcroft,F.M., Ezaki,T. and Smith,R.F. and Smith,T.F. (1992) Pattern-induced multi-sequence
Doyle,D.A. (2003) Crystal structure of the potassium channel alignment (pima) algorithm employing secondary structure-
KirBac1.1 in the closed state. Science, 300, 1922–1926. dependent gap penalties for use in comparative protein modeling.
Kyte,J. and Doolittle,R.F. (1982) A simple method for displaying the Protein Eng., 5, 35–41.
hydropathic character of a protein. J. Mol. Biol., 157, 105–132. Thompson,J.D., Higgins,D.G. and Gibson,T.J. (1994) CLUSTAL
Morgenstern,B., Frech,K., Dress,A. and Werner,T. (1998) W: improving the sensitivity of progressive multiple sequence
DIALIGN: finding local similarities by multiple sequence align- alignment through sequence weighting, position-specific gap
ment. Bioinformatics, 14, 290–294. penalties and weight matrix choice. Nucleic Acids Res., 22,
Mount,D.W. (2001) Bioinformatics: Sequence and Genome Ana- 4673–4680.
lysis, Cold Spring Harbor Laboratory, Cold Spring Harbor, Thompson,J.D., Plewniak,F. and Poch,O. (1999) A comprehensive
New York, USA pp 139–204. comparison of multiple sequence alignment programs. Nucleic
Nero,T.L. and Louis,W.J. (1992) Evaluation of the predicted sec- Acids Res., 27, 2682–2690.
ondary structure of bacteriorhodopsin. Prediction of the bovine von Heijne,G. (1992) Membrane protein structure prediction. Hydro-
rhodopsin secondary structure and its sequence similarity with phobicity analysis and the positive-inside rule. J. Mol. Biol., 225,
bacteriorhodopsin. Biochem. Int., 27, 763–770. 487–494.

769

Downloaded from https://academic.oup.com/bioinformatics/article-abstract/20/5/758/214103


by guest
on 03 May 2018

You might also like