Professional Documents
Culture Documents
Received on August 5, 2003; revised on October 2, 2003; accepted an October 14, 2003
Advance Access publication January 29, 2004
proteins are exposed to water and tend to be hydrophilic. adaptive systems. This technique relies on pre-constructed
These propensity and conservation differences suggest that profiles of transmembrane segments and comparing the target
substitution matrices that were constructed using water- sequences against these profiles. The rate of successful identi-
soluble proteins alignments as reference data, are inappropri- fication is slightly higher in all these methods and depends on
ate for the analysis of TMPs. Various groups have developed the quality of the profiles alignment (Chen et al., 2002; Kall
substitution matrices specifically tailored to TMP, and indeed and Sonnhammer, 2002).
they differ significantly from the standard matrices (Jones While in theory the concept of parameters dependency
et al., 1994; Ng et al., 2000). on topology is not new, and applied manually (and sub-
Alignment of sequences within the same family is usually a jectively) for many years, STAM is the first application to
straightforward process. The sequences’ identity in such cases combine secondary structure prediction (not known structure)
is high enough to achieve a meaningful alignment without with alignment parameters variability, in order to address the
substantial effects due to the choice of substitution matrix or special problem of multi-sequence alignment of transmem-
insertion/deletion penalties. However, when the sequences set brane proteins. In this sense STAM is different from global
include members that belong to the same superfamily but not alignment programs, in that it breaks the sequences into seg-
necessarily to the same family, the homology identity can fall ments. It is also different from local alignment algorithms by
below the 30% threshold, and the importance of the alignment maintaining the structural integrity of the transmembrane seg-
parameters in obtaining a correct alignment becomes more ments without much emphasis on local similarities. It differs
evident. For example, two ion-channel proteins can belong from both classes of programs by the applications of different
to two distinct families with little or no obvious sequence alignment parameters at different sections of the sequence.
identity; yet share the same folding topology. In this situ-
ation, standard alignment programs encounter problems, and
in many cases fail to align functionally important domains. ALGORITHM
Jones and colleagues already wrote in 1994: ‘Considering The basic structure of the algorithm is described in the flow
the differences in the mutability patterns observed between chart (Fig. 1). The first step is to predict the general topo-
a lipid and non-lipid environment, clearly when trying to logy of each sequence by distinguishing between two types of
align distantly related transmembrane segments it is vital to segments: transmembrane and non-transmembrane. This sort-
bear these differences in mind. Alignment programs that use ing is achieved by applying improved hydrophobicity analysis
the transmembrane matrix for transmembrane regions (either (von Heijne, 1992) to the sequences.
experimentally determined or predicted) and a general muta- Different scales can be used (e.g., Guy, 1985 or Kyte and
tion data matrix for the polar flanking regions are likely to Doolittle, 1982), as well as sliding windows sizes. Standard
perform much better than programs that use a single matrix’. hydropathy analysis with a single window has been shown to
(Jones et al., 1994). be unreliable (Fariselli et al., 2003). STAM uses two sliding
In addition, the question of variable penalty gaps as a windows (the trapezoidal method) and a sensitive peak detec-
function of the secondary structures is an old one. In 1989, tion method that allows it to identify segments even with
Henneke published an algorithm that takes into account the low hydrophobic signature. From our experience, using this
known secondary structure of a sequence and adjust the gap improved method, the success rate of correctly identifying the
penalty accordingly. In 1992, Smith and Smith published the true type of the segment seems to be relatively insensitive to
pattern-induced multi-sequence alignment (PIMA) algorithm the specific scale used as long as it is a realistic one.
and showed that ‘secondary structure information can signific- The hydrophobicity-based method is enhanced further by
antly improve the accuracy of aligning structure boundaries helicity propensity analysis that tries to minimize the number
in a set of homologous sequences even when the structure of false positives (over-prediction of transmembrane seg-
of only one member of the family is known’. However, as ments) or false negatives (under-prediction). For example,
was mentioned before, only a handful of TMPs have known many hydropathy-based programs often fail to identify the
structure (Smith and Smith, 1992). S4 segment in voltage gated 6TM ion channels. The added
At the same time, many methods of identifying the pos- helicity analysis helps STAM to identify it in most cases. This
ition of the transmembrane segments in the sequence have propensity scale is described in the work by Persson and Argos
been developed. The simplest ones rely on scales that indic- (1994).
ate the relative hydrophobicity of the different amino acids. STAM’s goal is not to predict accurately the transmem-
The basic assumption is that long stretches of hydrophobic brane topology of all the sequences in the sequences set,
residues in a TMP correspond with the actual transmembrane but rather to achieve a biophysical meaningful alignment.
segments. Many scales have been suggested based on different As long as a sizable number of the sequences have the
approaches, with similar success in predicting the location of same topology and are cataloged as such, the resulting align-
the transmembrane parts. Later on, the prediction algorithms ment will not exhibit dependence on the individual sequences
shifted towards approaches based on neural networks and that comprise the profile. Thus, the method of establishing
759
that belong to the distribution bin that has most members will
constitute a basis for the overall alignment. The rest of the
sequences are stored for future alignment.
The segments are then aligned in sequential order. All the
corresponding transmembrane segments are aligned together
under very restrictive conditions. The gap-opening penalty is
set to a very high value and the substitution matrix is set to
reflect the different substitution probabilities in a transmem-
Fig. 1. The flowchart of STAM’s algorithm. See text for more details.
brane region. From this alignment, a transmembrane core is
extracted. This core is the block of columns that have no gaps
the transmembrane topology could be any of the common at any positions. The ‘fringe residues’ are then chopped and
algorithms available today or will be available in the future. assigned to the surrounding loop regions. This ensures that
In addition to this automated prediction and assignment step, errors in identifying the edges of the transmembrane seg-
we gave the user the option to impose his/her own prediction ments may have a second opportunity to be aligned correctly
before the alignment step using a combination of methods and (example at Fig. 2). STAM also detects segments that diverge
via a ‘majority consensus vote’ obtain better confidence in the significantly from the block form (such as to make the core
prediction (Nilsson et al., 2002) or preferably experimentally region smaller than preset number of residues) and removes
obtained data. them from the alignment on the assumption that they are false
STAM should not be used as a general alignment pro- positives.
gram. Water-soluble proteins that have core hydrophobic Rather then reinvent the wheel, we used ClustalW algorithm
regions will be misidentified and cause major alignment (Thompson et al., 1994) to align the segments, though the user
errors. For water-soluble proteins, the standard alignment can easily modify the source code to use a different alignment
methods are adequate, and should be used. When TMPs method. We chose ClustalW, since the source code is freely
are aligned (with STAM or any other alignment program), available and allows the user to modify ClustalW parameters
it is practical to separate the transmembrane region from the (such as extension gap penalty) and its integration into STAM
large soluble domains. This is done for two main reasons: was fairly easy. However, studies have shown that the differ-
(1) soluble domains are often highly variable (among different ences between various alignment programs are comparatively
families) and therefore their alignment is meaningless from a small (Elofsson, 2002; Thompson et al., 1999).
modeling point of view. (2) Large soluble domains typically Once all the transmembrane segments are aligned, the modi-
have properties of water-soluble proteins, thus making them fied loop segments are analyzed. The process is identical to the
poor candidates for alignment by a program that was designed alignment process of the transmembrane segments. This time
specifically for transmembrane regions. however, the alignment parameters are more lax. The penalty
Once the hydrophobic segments have been resolved, an for opening a gap is substantially lower and the substitution
additional process of ‘segments’ conditioning’ is preformed. matrices are the standard PAM or BLOSUM.
This process takes into account the propensity of the residues With this, the major part of the alignment process is
at the edges of the segments to be part of a transmembrane complete. The aligned segments are recombined to aligned
helix (Persson and Argos, 1994). sequences to compose a profile. The remainder of the
STAM then sorts the predicted topologies into distribution sequences that did not participate in the previous alignment
bins. In the best case, if all sequences share the same topo- procedure (due to different identified topology) is now aligned
logy, the distribution will include only one bin. The sequences sequentially to the profile. By imposing a secondary structure
760
mask on the profile, we take advantage of this feature in Correct positioning of the helices
many profile-sequence alignment programs, such as ClustalW We compared STAM alignment of eukaryotic (bovine)
(Thompson et al., 1994) or Hmmer (Durbin et al., 1998; Eddy, rhodopsin (Palczewski et al., 2000), and three bac-
1995). terial rhodopsins: bacteriorhodopsin (Belrhali et al., 1999),
STAM is designed to reduce the impact of topology pre- halorhodopsin (Kolbe et al., 2000) and sensory rhodopsin II
diction errors on the final alignment. There are two common (Royant et al., 2001).
prediction errors. False positives, which is the assignment Even though, the structures of the prokaryotic and euka-
of transmembrane status to a loop or the breakage of a long ryotic families are distant from one another they share many
segment into two shorter ones, and false negatives, which is structural characteristics (Nero and Louis, 1992). Both famil-
the assignment of loop status to a transmembrane segment. ies have a seven transmembrane helix motif, but the bacteri-
In the vast majority of cases this will lead the sequence to orhodopsin helices tend to be shorter and more perpendicular
be cataloged in a different distribution bin than the major- to the membrane.
ity of the sequences (and therefore will not be part of the As we can see in Figure 3A, STAM identified correctly all
core profile) unless all the sequences share the same char- the transmembrane segments and aligned them without gaps
acteristics. If they do, the consistent nature of the prediction in their helical cores. Figure 3B is the ClustalW alignment
algorithm will make the same prediction mistake in all of them of the same sequences. While ClustalW produced essentially
and the alignment will not be much effected. For that reason, the same alignment for the three bacterial sequences, it failed
the consistency of the topology analysis is key to reduce to position the helices between the families correctly. The
the effect misprediction will have on the overall alignment. first three helices of the bovine rhodopsin are completely mis-
Still, unidentified false positives disrupt the final alignment aligned with the helices of the three bacterial rhodopsins. It
more than false negatives, since regions identified as loops is important to note that STAM identified the topology of all
are aligned again after the putative transmembrane helices. the sequences correctly using the KD hydrophobicity scale,
Consequently, STAM will choose a consensus topology with and failed to do so using Guy’s scale. Yet, even with the
the least number of segments between two bins with the misidentification of the topology, STAM managed to align
same number of sequences. The last defense against predic- the sequences correctly because two sequences were enough
tions errors is the removal from the core-alignment segments to build a meaningful secondary-structure mask to include in
that cause the core helical region to be smaller than a pre- the profile/sequence alignment procedure.
set number of residues. These misalignments are often the
result of a prediction mistake along the sequence. However, Insertions/deletions in known helices
even with these measures, prediction mistakes can effect the Crystal structure have been determined for the pore forming
final alignment. Hence, the user is encouraged to inspect the domain of four distantly related bacterial K + channels: KcsA
prediction results and correct them manually if an error is (Doyle et al., 1998), MthK (Jiang et al., 2002), KirBac1.1
detected. (Kuo et al., 2003) and KvAP (Jiang et al., 2003). Figure 4
STAM was implemented in Microsoft Visual Basic compares the alignments obtained using STAM to ClustalW of
version 6 and currently is available by request from the author these proteins and the three closest homologs to each that have
for Windows 2000/XP-based machines (and will be available <50% identity to each other, as was determined by a BLAST
on our future website). We currently are working to port the search (Altschul et al., 1990) of the non-redundant database.
algorithm to more environments. Due to their low level of sequence conservation, the more
peripheral M1 helices (of the 2-TM channels) and S5 helix of
the KvAP, are difficult to align. ClustalW (Fig. 4A) introduces
RESULTS gaps in the known helices in the crystallized structures, and
Comparing the accuracy of alignments is a difficult task even in the putative helical regions of some of the other sequences.
when the sequences involved are of globular proteins. A Based on the crystal structures, both programs misaligned M1
comparison is usually made between the proteins’ sequences of the KirBac family relative to the other families. However,
alignments and the proteins’ three-dimensional (3D) struc- while STAM places the M1 of the KirBac against the M1 of
tural alignments. This approach is not adequate when TMPs the other families, ClustalW places it mostly against insertions
are concerned. The number of known structures of TMP is in the other sequences. If one was to model KirBac on the
very small, and the subset of known TMP structures that are backbone of any of the other families, using STAM will retain
homologs is almost empty. the secondary structure of the sequence while using ClustalW
However, we first examined our method versus ClustalW will result in a disorganized coil spanning the lipid bilayer.
by aligning sequences from families for which at least one
structure has been determined. We looked at three different Sensitivity to motifs in distantly related sequences
major improvements that STAM introduces to the alignment The final example is the alignment of a small number of dis-
of TMPs. tantly related 6TM ion channels. ClustalW perform relatively
761
Fig. 3. (A) STAM alignment of rhodopsins. (B) ClustalW alignment of the same sequences. The helices’ locations in the crystal structures
are denoted by a black underline. The structures codes in the PDB database are as follows: bacteriorhodopsin—1QHJ, halorhodopsin—1E12,
sensory rhodopsin II—1H68 and bovine rhodopsin—1F88.
well when the number of sequences is large and the proteins does not have the (RXX)n signature]. In addition, some of
are somewhat closely related. Since ClustalW obtains its final the other helices are misaligned as well (especially S5). In
alignment by building a phylogenic guiding tree and clustering this example, STAM avoids these mistakes. It aligns well the
the sequences accordingly, the quality of the final alignment S4 region (7/8 ratio in three positions and 4/8 in one other
depends on the amount of input sequences their relative evol- position). The gaps are missing in the putative transmembrane
utionary distances from one another. When the number of regions, and the gaps concentrate at the highly variable loop
sequences is small and the sequences are very distant, the regions or at the edges of the helices.
quality of the alignment deteriorates. In this case, we aligned
small number of 6TM ion channels that are very distant from
one another. DISCUSSION
Eight different 6TM sequences were aligned: four K+ Chan- When aligning sequences of TMP, there are added difficulties
nels, two Ca2+ channels and two Na+ channels. KvAP and to the ones that are already associated with multi-sequence
Shaker were introduced to the alignment since KvAP is the alignment algorithms. Instead of developing a new alignment
only known structure of a 6TM channel and Shaker is one of algorithm, we constructed an algorithm that uses contem-
the most-studied channels. The results are shown in Figure 5. porary alignment procedures, and applies certain criteria to
In this alignment, ClustalW failed to identify the signature treat the special cases of TMP. It has been shown already that
repetitive motif of arginines at every third position in the most available alignment programs today achieve about the
S4 helix of the voltage-sensitive channels, which is respons- same rate of success regardless of the underlying algorithms
ible for the charge movement through the membrane during (Thompson et al., 1999). The basic concept behind this
gating. The highest relative amount of arginine or lysine in project was not to reinvent the wheel when it comes to multi-
an alignment position is 3/8 [the Escherichia coli segment sequence alignments, but rather use the available tools and
762
Fig. 3. Continued.
techniques and apply the restrictions arising from the unique between the helices have no regular secondary structure and
physio-chemistry of TMP. long water-exposed segments can be aligned with standard
We chose not to base our program on Hidden-Markov- unconditioned methods. As one can see, these are very basic
Models (HMM) approach because HMMs necessitate an assumptions that cover most of the TMP cases (but not
initial seed alignment on which the final alignment will be all). Once the hydrophobic regions identification process is
constructed. This requires the user to provide a good initial complete, the segments are aligned according to their classi-
alignment for the TMP’s in question. If the seed alignment is fication. Transmembrane segments are aligned with high gap
unambiguous, no complications arise. Things become more penalties to prevent distortions in the secondary structure and
problematic when the sequences in question have little overlap with a substitution matrix that represents the substitution prob-
and highly variable. Then the user has to inject his subjective abilities in a transmembrane segment. However, we must note
criteria and this in turn will propagate throughout the final that the gap penalty value is more important to the alignments
alignment. then is the selected matrix. This may reflect the situation today
We prefer to use the more conventional approach of of which only a handful of substitution matrices, tailored spe-
matrix-based alignments and to use standard criteria that are cifically for TMPs, exist. To compound this problem, most
general to TMPs and based on empirical results (such as the of these matrices have close entropy values due to their being
hydrophobic scales). developed from closely related proteins. They therefore rep-
Yet HMMs are extremely fast and powerful tools that need resent only a narrow band in the evolution process of TMPs.
a good initial alignment. STAM may be just the objective tool STAM can be the tool needed to align more distant sequences
to provide these initial alignments. in order to obtain more distant matrices.
As its name implies, STAM, the approach is very Even this naive approach has shown improvement over
simplistic in nature and not perfect. The method is based standard alignment programs. We chose to compare its results
on the following assumptions: (1) long hydrophobic seg- versus these of ClustalW for two reasons: (1) ClustalW is
ments form transmembrane α-helices and (2) short segments one of the most successful and popular alignment programs.
763
(2) We used ClustalW algorithm as the alignment engine and, Nonetheless, the method that is the basis of STAM strives
therefore, it is reasonable to observe the changes STAM brings to minimize the effects of possible mistakes. Since the
to a ClustalW alignment. However, STAM is a very modular segments are aligned individually, there is no drift of mis-
program and replacing the alignment engine with the user’s alignment errors to the neighboring segments. A major mis-
favorite alignment program is a straightforward procedure. alignment error in transmembrane proteins is the existence of
For example, we found that at the last phase of the alignment, a gap in the transmembrane regions. ClustalW tends to prefer
when the problem type becomes a sequence-profile align- to extend gaps. If one gap is opened, the cost of opening
ment, SAM (Karplus et al., 1998), which is a HMM-based an adjacent gap drops significantly. Though single gaps can
program, performs better than ClustalW in some cases. occur in the transmembrane segments, longer gaps are rare.
STAM does not guarantee a ‘correct’ alignment and some- STAM assigns the same penalty to the extension gap as to the
times it exhibits sensitivity to different input parameters. initial one. This discourages the appearance of long stretches
STAM is very flexible in this regard. It can either work in full of gaps in a transmembrane segment. Because of this, most
automatic mode (in which no input from the user is required), of the misalignments will appear at the ends of the helices
or manual mode (in which the different parameters and topo- and not in the core. This is due to errors in the identification
logy predictions can be altered by users). Users are advised of the helices edges. Since proteins tend to lose their ordered
to consider alternative alignments obtained with different secondary structure near the membrane boundaries, this is an
inputs. acceptable error.
Fig. 4. ClustalW alignment (A) and STAM alignment (B) of the pore-forming domain of different K + channels with solved structure. The
first helices in the known structures are underlined. ClustalW open gaps in many sequences at the first transmembrane helix (S5 in the
KvAP, M1 in all others), making a homology modeling based on its alignment a difficult task. In comparison, STAM maintains the structural
integrity of the helices. The NCBI accession numbers and PDB codes for sequences used in this alignment: 1ORQ (KvAP), gi:15024246
(KvAP-Like1), gi:16411529 (KvAP-Like2), gi:24350062 (KvAP-Like3), 1BL8 (KcsA), gi:27364075 (KcsA-Like1), gi:9657586 (KcsA-
Like2), gi:22406479 (KcsA-Like3), 1LNQ (MthK), gi:1498918 (MthK-Like1), gi:23129256 (MthK-Like2), gi:2648884 (MthK-Like3),
1P7B (KirBac), gi:17130300 (KirBac-Like1), gi:23042578 (KirBac-Like2) and gi:23015860 (KirBac-Like 3).
764
Fig. 4. Continued.
Another aspect that STAM brings to the forefront is the fold. The pore lining helices (S5-P-S6) of K+ channels exhibit
importance and relevancy of the alignment score. Most pro- high degree of conservation since these segments control
grams treat the alignment score as a function that has to be the channel selectivity and ions permeation. Comparing dif-
maximized. Once a maximal value has been attained, the pro- ferent potassium channels based on the pore lining helices
gram declares the corresponding alignment as ‘optimal’. This point unambiguously to a common ancestor. Nevertheless,
approach is somewhat misleading. Since the program does not this domain is much less conserved among channels with
know ‘a-priori’ the secondary structure of the sequence, it can- different ion selectivity, e.g., the pore-forming domain of
not take it into account when maximizing the score function. voltage-gated Ca2+ and K+ channels are quite dissimilar. On
This can lead to major misalignments of helices and breaking the other hand, the composition of the preceding four helices
up the secondary structure. STAM does not try to maximize (S1–S4) depends on the specific gating mechanism of the
the score over the entire sequences but forces the helices cores channels. Thus, this domain is fairly similar for voltage-gated
to be aligned first. The alignments may not look as ‘nice’ and K+ , Ca2+ and Na+ channels but can be very divergent among
be less in a ‘block’ form but they will retain the biophysical K+ channels that are gated differently (Fig. 5).
properties of the sequences. STAM solves this dilemma by generating a new phylogenic
ClustalW approach of building a phylogenic tree enhances tree for each segment’s alignment. This ensures that more
the accuracy of the alignments by aligning the closest similar segments will be aligned first (Fig. 6). In the example
sequences first before progressing to more distant ones. This above, if a 6TM voltage-gated K+ channel is aligned with
is done by pairwise alignment of all the sequences and meas- a 6TM voltage-gated Ca2+ channel and a 6TM Ca-activated
uring the percentage of matched positions. This method is K+ channel, the S4 of the voltage-gated K+ channel will be
valid if all the segments in a sequence share a common first aligned with the S4 of the voltage-gated Ca2+ channel.
evolutionary pathway. However, in TMPs in general, and The order changes at the S5-P-S6 domain; these segments of
ion channel proteins in particular, this assumption in not the K+ channels are aligned with each other first, and only
necessarily valid. Lets take for example the 6TM ion channel than are aligned with the Ca2+ channel and the rest of the
765
Fig. 5. ClustalW alignment of distantly related 6TM ion channels (A) and STAM’s alignment of the same sequences (B). The helices’
locations in the known structure (KvAP) are underlined. The ovals denote the wide gaps ClustalW introduces in the middle of S2 and S5.
The black rectangles denote the location of S4 of the voltage sensor. ClustalW fails to identify the repetitive motif of charged residues in
the voltage sensor. NCBI accession numbers for the sequences used: gi:30749953 (KvAP), gi:157063 (Shaker), gi:103086 (Ca-activated K),
gi:1787503 (E.coli K), gi:4633669 (T-Type Ca), gi:6751839 (Ascidian Ca), gi:1771576 (Rat Na) and gi:749324 (Yeast Na).
766
Fig. 5. Continued.
767
Fig. 6. The phylogenic trees that guide the alignments in Figure 5. (A) The phylogenic tree for the ClustalW alignment. This tree is applied
to the entire sequence. (B) The phylogenic tree of the S4 segment in the STAM alignment. The channels with the repetitive arginines are
grouped together. The channels with almost no such motif (The Ca-activated K + and the E.coli K + channels) are also grouped separately.
(C) The phylogenic tree for the P segment in the STAM alignment of Figure 5B. Since the P segment function as the selectivity filter, the
channels are sorted according to their selectivity. Note: in the STAM alignment, KvAP was aligned to the profile of the rest of the sequences
and therefore does not appear in the phylogenic tree.
768
Jiang,Y., Lee,A., Chen,J., Cadene,M., Chait,B.T. and MacKinnon,R. Ng,P.C., Henikoff,J.G. and Henikoff,S. (2000) PHAT: a
(2002) Crystal structure and mechanism of a calcium-gated transmembrane-specific substitution matrix. Bioinformatics, 16,
potassium channel. Nature, 417, 515–522. 760–766.
Jiang,Y., Lee,A., Chen,J., Ruta,V., Cadene,M., Chait,B.T. and Nilsson,J., Persson,B. and von Heijne,G. (2002) Prediction of partial
MacKinnon,R. (2003) X-ray structure of a voltage-dependent membrane protein topologies using a consensus approach. Protein
K + channel. Nature, 423, 33–41. Sci., 11, 2974–2980.
Jones,D.T., Taylor,W.R. and Thornton,J.M. (1994) A mutation data Overington,J., Donnelly,D., Johnson,M.S., Sali,A. and Blundell,T.L.
matrix for transmembrane proteins, FEBS Lett., 339, 269–275. (1992) Environment-specific amino-acid substitution tables—
Kall,L. and Sonnhammer,E.L. (2002) Reliability of transmembrane tertiary templates and prediction of protein folds. Protein Sci.,
predictions in whole-genome data, FEBS Lett., 532, 415–418. 1, 216–226.
Karplus,K., Barrett,C. and Hughey,R. (1998) Hidden Markov mod- Palczewski,K., Kumasaka,T., Hori,T., Behnke,C.A., Motoshima,H.,
els for detecting remote protein homologies. Bioinformatics, 14, Fox,B.A., Le,T.I, Teller,D.C., Okada,T., Stenkamp,R.E.,
846–856. Yamamoto,M. and Miyano,M. (2000) Crystal structure of
Kolbe,M., Besir,H., Essen,L.O. and Oesterhelt,D. (2000) Structure rhodopsin: a G protein-coupled receptor. Science, 289,
of the light-driven chloride pump halorhodopsin at 1.8 Å resolu- 739–745.
tion, Science, 288, 1390–1396. Persson,B. and Argos,P. (1994) Prediction of transmembrane seg-
Komiya,H., Yeates,T.O., Rees,D.C., Allen,J.P. and Feher,G. (1988) ments in proteins utilizing multiple sequence alignments. J. Mol.
Structure of the reaction center from Rhodobacter sphaeroides Biol., 237, 182–192.
R-26 and 2.4.1: symmetry relations and sequence comparis- Royant,A., Nollert,P., Edman,K., Neutze,R., Landau,E.M., Pebay-
ons between different species. Proc. Natl Acad. Sci. USA, 85, Peyroula,E. and Navarro,J. (2001) X-ray structure of sensory
9012–9016. rhodopsin II at 2.1-Å resolution. Proc. Natl Acad. Sci USA, 98,
Kuo,A., Gulbis,J.M., Antcliff,J.F., Rahman,T., Lowe,E.D., 10131–10136.
Zimmer,J., Cuthbertson,J., Ashcroft,F.M., Ezaki,T. and Smith,R.F. and Smith,T.F. (1992) Pattern-induced multi-sequence
Doyle,D.A. (2003) Crystal structure of the potassium channel alignment (pima) algorithm employing secondary structure-
KirBac1.1 in the closed state. Science, 300, 1922–1926. dependent gap penalties for use in comparative protein modeling.
Kyte,J. and Doolittle,R.F. (1982) A simple method for displaying the Protein Eng., 5, 35–41.
hydropathic character of a protein. J. Mol. Biol., 157, 105–132. Thompson,J.D., Higgins,D.G. and Gibson,T.J. (1994) CLUSTAL
Morgenstern,B., Frech,K., Dress,A. and Werner,T. (1998) W: improving the sensitivity of progressive multiple sequence
DIALIGN: finding local similarities by multiple sequence align- alignment through sequence weighting, position-specific gap
ment. Bioinformatics, 14, 290–294. penalties and weight matrix choice. Nucleic Acids Res., 22,
Mount,D.W. (2001) Bioinformatics: Sequence and Genome Ana- 4673–4680.
lysis, Cold Spring Harbor Laboratory, Cold Spring Harbor, Thompson,J.D., Plewniak,F. and Poch,O. (1999) A comprehensive
New York, USA pp 139–204. comparison of multiple sequence alignment programs. Nucleic
Nero,T.L. and Louis,W.J. (1992) Evaluation of the predicted sec- Acids Res., 27, 2682–2690.
ondary structure of bacteriorhodopsin. Prediction of the bovine von Heijne,G. (1992) Membrane protein structure prediction. Hydro-
rhodopsin secondary structure and its sequence similarity with phobicity analysis and the positive-inside rule. J. Mol. Biol., 225,
bacteriorhodopsin. Biochem. Int., 27, 763–770. 487–494.
769