Professional Documents
Culture Documents
One well-known limitation of the Zuker algor- best suited to folding sequences for which a pre-
ithm is that it is incapable of predicting so-called vious multiple alignment exists, so that scores may
RNA pseudoknots. This is the problem that we be assigned to possible base-pairs by comparative
address here. analysis. It is not clear to us that the MWM algor-
The thermodynamic model for non-pseudo- ithm will be amenable to folding single sequences
knotted RNA secondary structure includes some using the relatively complicated Turner thermo-
stereotypical interactions, such as stacked base- dynamic model. However, we believe that this was
paired stems, hairpins, bulges, internal loops, and the ®rst work that indicated that optimal RNA
multiloops. Formally, non-pseudoknotted struc- pseudoknot predictions can be made with poly-
tures obey a ``nesting'' convention: that for any nomial time algorithms. It had been widely
two base-pairs i, j and k, l (where i < j, k < l and believed, but never proven, that pseudoknot pre-
i < k), either i < k < l < i or i < j < k < l. It is precisely diction would be an NP problem (NP, non-deter-
this nesting convention that the Zuker dynamic ministic polynomial; e.g. only solvable by heuristic
programming algorithm relies upon to recursively or brute force approaches).
calculate the minimal energy structure on progress- Here, we describe a dynamic programming
ively longer subsequences. An RNA pseudoknot is algorithm which ®nds optimal pseudoknotted
de®ned as a structure containing base-pairs which RNA structures. We describe the algorithm using
violate the nesting convention. An example of a a diagrammatic representation borrowed from
simple pseudoknot is shown in Figure 1. quantum ®eld theory (Feynman diagrams). We
RNA pseudoknots are functionally important in implement a version of the algorithm that ®nds
several known RNAs (ten Dam et al., 1992). For minimal energy RNA structures using the standard
example, by comparative analysis, RNA pseudo- RNA secondary structure thermodynamic model
knots are conserved in ribosomal RNAs, the cataly- (Freier et al., 1986, Serra & Turner, 1995), augmen-
tic core of group I introns, and RNase P RNAs. ted by a few pseudoknot-speci®c parameters that
Plausible pseudoknotted structures have been pro- are not yet available in the standard folding par-
posed (Pleij et al., 1985), and recently con®rmed ameters, and by coaxial stacking energies (Walter
(Kolk et al., 1998) for the 30 end of several plant et al., 1994) for both pseudoknotted and non-pseu-
viral RNAs, where pseudoknots are apparently doknotted structures. We demonstrate the proper-
used to mimic tRNA structure. In vitro RNA evol- ties of the algorithm by testing it on several small
ution (SELEX) experiments have yielded families RNA structures, including both structures thought
of RNA structures which appear to share a com- to contain pseudoknots and structures thought not
mon pseudoknotted structure, such as RNA to contain pseudoknots.
ligands selected to bind HIV-1 reverse transcriptase
(Tuerk et al., 1992).
Most methods for RNA folding which are
Algorithm
capable of folding pseudoknots adopt heuristic Here, we will introduce a diagrammatic way of
search procedures and sacri®ce optimality. representing RNA folding algorithms. We will
Examples of these approaches include quasi-Monte start by describing the Nussinov algorithm
Carlo searches (Abrahams et al., 1990) and genetic (Nussinov et al., 1978), and the Zuker-Sankoff
algorithms (Gultyaev et al., 1995; van Batenburg algorithm (Zuker & Sankoff, 1984; Sankoff, 1985)
et al., 1995). These approaches are inherently in the context of this representation. Later on we
unable to guarantee that they have found the will extend the diagrammatic representation to
``best'' structure given the thermodynamic model, include pseudoknots and coaxial stackings. The
and consequently unable to say how far a given Nussinov and Zuker-Sankoff algorithms can be
prediction is from optimality. implemented without the diagrammatic represen-
A different approach to pseudoknot prediction tation, but this representation is essential to man-
based on the maximum weighted matching age the complexity introduced by pseudoknots.
(MWM) algorithm (Edmonds, 1965; Gabow, 1976)
was introduced by Cary & Stormo (1995) and Preliminaries
Tabaska et al. (1998). Using the MWM algorithm,
an optimal structure is found, even in the presence From here on, unless otherwise stated, a ¯at
of complicated knotted interactions, in O(N3) time continuous line will represent the backbone of an
and O(N2) space. However, MWM currently seems RNA sequence with its 50 -end placed in the left-
hand side of the segment. N will represent the
length (in number of nucleotides) of the RNA.
Secondary interactions will be represented by
wavy lines connecting the two interacting positions
in the backbone chain, while the backbone itself
always remains ¯at. No more than two bases are
allowed to interact at once. This representation
Figure 1. A simple pseudoknot. In a pseudoknot, does not provide insight about real (three-dimen-
nucleotides inside a hairpin loop pair with nucleotides sional) spatial arrangements, but is very con-
outside the stem-loop. venient for algorithmic purposes. When necessary
RNA Pseudoknot Prediction by Dynamic Programming 2055
Figure 2. Diagrammatic representation of the most relevant RNA secondary structures, including a pseudoknot.
The nucleotides of the sequence are represented by dots. Single-stranded regions (SS) are not involved in any second-
ary structure. A hairpin (H) is a sequence of unpaired bases bounded by one base-pair. Stems (S), bulges (B) and
internal loops (IL) are all nested structures bounded by two base-pairs. In a stem, the two base-pairs are contiguous
at both ends. In a bulge, the two base-pairs are contiguous only at one end. In an internal loop, the two base-pairs
are not contiguous at all. Multiloops (M) refer to any structure bounded by three or more base-pairs. Any non-nested
structure is referred to as a pseudoknot.
2056 RNA Pseudoknot Prediction by Dynamic Programming
8
< EIS1
i; j IS1
2
vx
i; j optimal EIS
i; j : k; l vx
k; l IS2
4
:
PI M wxI
i 1; k wxI
k 1; j ÿ 1 multiloop
8k; l i 4 k 4 l 4 j
sion in ISs at some order in the recursion for vx in M stands for the score for generating a multiloop.
Figure 4. The symbol O(g) indicates the order of ISg The Turner thermodynamic rules also penalize an
at which we truncate the recursion. amount for each closing pair in a multiloop. By
These recursions are equivalent to those pro- starting a multiloop we are specifying already one
posed by Sankoff (1985) in theorem 2. Notice also of its closing pairs; this closing-pair score is rep-
that in de®ning the recursive algorithm we have resented here by PI.
not yet had to specify anything about the particu- The recursion relations used to ®ll the wx matrix
lar manner in which the contribution from differ- include: single-stranded nucleotides, external pairs,
ent ISs are calculated in order to obtain the most and bifurcations. The actual recursion is easier to
optimal folding. understand by looking at the diagrams involved
The simplest truncation is to stop at order zero, (given in Figure 7) and the recursion can be
O(0). In this approximation none of the ISs (hair- expressed as:
8
> P vx
i; j paired
>
>
>
<
Q wx
i 1; j
wx
i; j optimal single-stranded
5
>
> Q wx
i; j ÿ 1
>
>
: wx
i; k wx
k 1; j 8k; i 4 k 4 j: bifurcation
RNA Pseudoknot Prediction by Dynamic Programming 2057
8
>
> EIS1
i; j IS
1
>
>
>
>
>
> EIS2
i; j : k; l vx
k; l IS
2
>
>
<
vx
i; j optimal P M wx
i 1; k wx
k 1; j ÿ 1 nested
8
>
> I I I multiloop
>
>
>
>
>
> e M~ GwI whx
i 1; r : k; l
>
> P non-nested
: I whx
K 1; j ÿ 1 : l ÿ 1; r 1 multiloop
8i; k; l; r; j i 4 k 4l 4r 4j
8
> P vx
i; j paired
>
>
>
>
>
> Q wx
i 1; j
>
> single-stranded
>
>
>
>
< Q wx
i; j ÿ i
wx
i; j optimal
9
>
> nested
> wx
i; k wx
k 1; j
> bifurcation
>
>
>
>
>
>
>
> Gw whx
i; r : k; l non-nested
>
: bifurcation
whx
k 1; j : l ÿ 1; r 1
Where Gw denotes the score for introducing a ABCABC (the topology present in Escherichia coli a
pseudoknot. We should also remember that the mRNA; Gluick et al., 1994), and others. However,
algorithm uses two different wx matrices depend- the algorithm is not able to solve all possible
ing on whether the subset i . . . j is free-standing knotted con®gurations, as for instance a parallel
(wx) or appears inside a multiloop (in which case b-sheet protein interaction ABCADBECD (see
we use wxI). The two recursions are identical Figure 12 for some details.) For a given con®gur-
apart from having different parameter values as ation we can decide unambiguously whether it is
described in Table 2. solvable or not by parsing it according to the
Practical considerations make us truncate the model. However, we still lack a systematic a priori
expansion at this stage; we will not include dia- characterization of the class of con®gurations that
grams that require three or more gap matrices. this algorithm can solve.
This statement should not mislead one into think- Note that two approximations are involved in
ing that we cannot deal with complicated pseu- the algorithm. Apart from that just mentioned
doknots. We de®ne a solvable con®guration as (truncating the in®nite expansion in gap matrices
one that can be parsed by our algorithm. That is, to make the algorithm polynomial), we also use
the approximation previously introduced for the con®guration than when the two hairpins just
nested algorithm (that ISs of O. > 2 or multiloops coexist without interaction of any kind.
are described in some approximated form). Despite The algorithm implements coaxial energies for
these limitations, this truncated pseudoknot algor- both nested and non-nested structures. We adopt
ithm seems to be adequate for the currently known the coaxial energies provided by Walter et al.
pseudoknots in RNA folding. (1994) for coaxial stacking of nested structures. For
The algorithm is not complete until we provide coaxial stacking of non-nested structures we
the full recursive expressions to calculate the gap multiply these previous energies by an estimated
matrices. For a given gap matrix, we have to con- (ad hoc) weighting parameter g < 1.
sider all the different ways that its diagram can be Using our diagrammatic representation it is
assembled using one or two matrices at a time. possible to be systematic in describing the poss-
(Again, Feynman diagrams are of great use here.) ible coaxial stacking that can occur. In the gener-
The full description of those diagrams is quite al recursion one has to look for contiguous
involved and the many technical details will not nucleotides, and allow them to be explicitly
add to the clarity of this exposition. In order to paired (but not to each other). This is best under-
give the reader a feeling for the kind of topologies stood with an example. Consider the recursion
the pseudoknot algorithm allows, we provide in for wx in Figure 11, in particular the bifurcation
the Appendix a simpli®ed version of the recursions diagram:
for the gap matrices in which coaxial stacking or
dangles are excluded (see below). wx
i; j ÿ! wx
i; k wx
k 1; j; 8k; i 4 k 4 j
10
Coaxial stacking and dangles In order to allow for the possibility of coaxial
stacking, this bifurcation diagram has to be com-
It is quite frequent in RNA folding to create a plemented with another one in which the nucleo-
more stable con®guration when two independent tides of the bifurcation are base-paired:
con®gurations stack coaxially. This occurs, for
instance, when two hairpin loops with their wx
i; j ÿ! vx
i; k vx
k 1; j C
k; i : k 1; j;
respective stems are contiguous. Then one of them 8k; i 4 k 4 j
11
can fall on top of the other, creating a more stable
RNA Pseudoknot Prediction by Dynamic Programming 2061
We expect this inaccuracy to increase in our case, (Figure 5). In both cases, the result is almost uni-
since the algorithm now allows a much larger con- versal pairing because there is enough freedom to
®guration space. Therefore, our limited objective be able to coordinate any position with another
here is to show that on a few small RNAs that are one in the sequence.
thought to conserve pseudoknots, our program (a Another important aspect of tRNA folding is
minimal-energy implementation of the pseudoknot coaxial energies. Most tRNAs gain stability by
algorithm using a thermodynamic model) will stacking coaxially two of the hairpin loops, and the
actually ®nd the pseudoknots; and for a few small third one with the acceptor stem. This aspect of
RNAs that do not conserve pseudoknots, our pro- tRNA folding is very important and in some cases
gram ®nds results similar to MFOLD, and does not crucial to determine the right structure. There are
introduce spurious pseudoknots. situations like tRNA DA0260 in which MFOLD
does not assign the lowest energy to the correct
structure (the MFOLD 3.0 prediction for DA0260
tRNAs misses the acceptor stem, and has a free energy of
ÿ22.0 kcal/mol). Our algorithm, on the other
Almost all transfer RNAs share a common clo- hand, implements coaxial energies; as a result, the
verleaf structure. We have tested the algorithm cloverleaf con®guration becomes the most stable
on a group of 25 tRNAs selected at random from folding for tRNA DA0260 (G ÿ24.3 kcal/mol).
the Sprinzl tRNA database (Steinberg et al., 1993). The implementation of coaxial energies explains
The program ®nds no spurious pseudoknot for why we found more cloverleaf structures for
any of the tested sequences. All but one (DT5090) tRNAs than MFOLD does.
of the tRNAs fold into a cloverleaf con®guration.
Of the 24 cloverleaf foldings, 15 are completely
consistent with their proposed structures (that is,
each helical region has at least three base-pairs in HIV-1-RT-ligand RNA pseudoknots
common with its proposed folding). The remain-
ing nine cloverleaf foldings misplace one (six High-af®nity ligands of the reverse transcriptase
sequences) or two (three sequences) of the helical of HIV-1 isolated by a SELEX procedure by Tuerk
regions. On the other hand, MFOLD's lowest et al. (1992) seem to have a pseudoknot consensus
energy prediction for the same set of tRNA secondary structure. These oligonucleotides have
sequences includes only 19 cloverleaf foldings, of between 34 and 47 bases, and fold into a simple
which 14 are completely consistent with their pseudoknot. Of a total of 63 SELEX-selected pseu-
proposed structures. Performance for our pro- doknotted sequences available from Tuerk et al.
gram is, therefore, at least comparable with (1992), we found 54 foldings that agreed exactly
MFOLD; the inaccuracies found are the result of with the structures derived by comparative anal-
the approximations in the thermodynamic model, ysis (G ÿ9 kcal/mol for sequence pattern I (3-
not a problem with the pseudoknot algorithm 2)). As expected, MFOLD predicts only one of the
per se. The relevant result in relation to the pseu- two stems (G ÿ7.5 kcal/mol for the same
doknot algorithm is that its implementation pre- sequence).
dicts no spurious pseudoknots for tRNAs.
One should not think of this result as a trivial
one, because when knots are allowed, the con®gur- Viral RNAs
ation space available becomes much larger than
the observed class of conformations. This problem Some virus RNA genomes (such as turnip
is particularly relevant for ``maximum-pairing- yellow mosaic virus, TYMV; Guiley et al., 1979)
like'' algorithms, such as the MWM algorithm pre- present a tRNA-like structure at their 30 -end that
sented by Cary & Stormo (1995) or a Nussinov includes a pseudoknot in the aminoacyl acceptor
implementation of our pseudoknot algorithm arm very close to the 30 -end (Kolk et al., 1998; Pleij
RNA Pseudoknot Prediction by Dynamic Programming 2063
et al., 1985; Dumas et al., 1987). Our program cor- ducible surfaces). This is a further generalization of
rectly predicts the TYMV tRNA-like structure with the results reported by Sankoff (1985). The calcu-
its pseudoknot for the last 86 bases at the 30 -end lation of an optimal folding becomes an expansion
with G ÿ30.4 kcal/mol (the MFOLD 3.0 pre- in ISs of increasingly higher order. Our formaliza-
diction for TYMV has a free energy of tion incorporates all current dynamic program-
G ÿ28.9 kcal/mol). The tRNA-like 30 terminal ming RNA folding algorithms in addition to our
structure is conserved among tymoviruses, and pseudoknot algorithm. It also establishes the limi-
also for the tobacco mosaic virus cowpea strain, tations of each approximation by determining at
another valine acceptor. Of the seven valine-accep- which order the expansion is truncated.
tor tRNA-like structures proposed to date (Van As for the thermodynamic implementation pre-
Belkum et al., 1987), we reproduce six of them, sented here, one of our major problems is the
except for kennedya yellow mosaic virus. almost complete lack of thermodynamic infor-
Another interesting pseudoknot appears in the mation about pseudoknot con®gurations. The ther-
last 189 bases of the 30 terminus of the tobacco modynamic algorithm is also sensitive to the
mosaic virus (TMV; Van Belkum et al., 1985). TMV accuracy of the existing thermodynamic par-
also has a tRNA-like pseudoknot structure at the ameters. We expect to improve this aspect by
end, but it may have additional upstream pseudo- implementing the more complete set of parameters
knots, up to a total of ®ve, forming a long quasi- provided by the Turner group (Mathews et al.,
continuous helix. We folded the upstream and 1998).
downstream regions separately in a piece of 84 The principal drawback is the time and memory
nucleotides (the folding requires 47 minutes and constraints imposed by the computational com-
9.8 Mb) and 105 nucleotides (the folding requires plexity of the algorithm. At this early stage, we
235 minutes and 22.5 Mb), respectively. Our pro- cannot analyze sequences much larger than 130-
gram predicts the 105 nucleotides downstream 140 bases. For now, the program is adequate for
region exactly with G ÿ32.5 kcal/mol. For the folding small RNAs. A 100 nt RNA takes about
84 nucleotides upstream region we ®nd four of the four hours and 22.5 Mb to fold on an SGI R10K
®ve helical regions with G ÿ19.0 kcal/mol. Origin200.
Finally we have considered the recently crystal- Due to practical limitations, at a given point in
lized ribozymes of the hepatitis delta virus (HDV; the recursion we only allow the incorporation of
FerreÂ-D'Amare et al., 1998). Our program predicts two gap matrices. However, since each of those
correctly the structure of the 91 nt antigenomic gap matrices can in turn be assembled by other
HDV ribozyme (G ÿ36.7 kcal/mol). Our pro- two of those matrices, it implies that the algor-
gram also predicts the pseudoknot present in the ithm includes in its con®guration space a large
87 nt genomic ribozyme (G ÿ43.9 kcal/mol; in variety of knotted motifs. The limitations of this
this case the prediction misses the short two-stem truncation appeared when we considered apply-
hairpin between positions 17-30). ing this approach to describe pairwise residue
interactions in protein folding. A parallel b-sheet
con®guration in protein structure provides an
Discussion example of a complicated knotted folding that
cannot be handled by the pseudoknot algorithm
Here, we present an algorithm able to predict presented here. However, all known RNA pseu-
pseudoknots by dynamic programming. This doknots can be handled by the algorithm, which
algorithm demonstrates that using certain approxi- makes the approximation useful enough for RNA
mations consistent with the accepted Turner secondary structure.
thermodynamic model, the prediction of pseudo- Although we implemented the algorithm for
knotted structures is a problem of polynomial com- energy minimization, extending MFOLD to pseu-
plexity (although admittedly high). Having an doknotted structures, the algorithm is not limited
optimal dynamic programming algorithm will to energy minimization. Our algorithm can be con-
enable extending other dynamic programming verted into a probabilistic model for pseudoknot-
based methods that rigorously explore the confor- containing RNA folding. Probabilistic models of
mational space for RNA folding (McCaskill, 1990; RNA second structure based on ``stochastic context
Bonhoeffer et al., 1993) to pseudoknotted struc- free grammar'' (SCFG) formalisms (Eddy et al.,
tures. 1994; Sakakibara et al., 1994; Lefebvre, 1996) have
Apart from the usefulness of the algorithm in been introduced both for RNA single-sequence
predicting pseudoknots, we also include coaxial folding and for RNA structural alignment and
energies (when two stems stack coaxially), a very structural similarity searches. The Inside and CYK
common feature of RNA folding. We expect dynamic programming algorithms used for SCFG-
MFOLD will also include coaxial energies in the based structural alignment are fundamentally simi-
near future (Mathews et al., 1998). lar to the Zuker algorithm (Durbin et al., 1998), and
Our algorithm is presented in the context of a have consequently also been unable to deal with
general framework in which a generic folding is pseudoknots. Heuristic approaches to applying
expressed in terms of its elementary secondary SCFG-like structural alignment models to pseudo-
interactions (which we have identi®ed as the irre- knots have been introduced (Brown, 1996;
2064 RNA Pseudoknot Prediction by Dynamic Programming
Notredame et al., 1997), and the maximum (1987). 3-D graphics modeling of the tRNA-like
weighted matching algorithm has been applied to 30 end of turnip yellow mosaic virus RNA: struc-
®nd optimal alignments (Tabaska & Stormo, 1997). tural and functional implications. J. Biomol. Struct.
An SCFG-like probabilistic version of our pseudo- Dynam. 4, 707-728.
Durbin, R., Eddy, S. R., Krogh, A. & Mitchison, G. J.
knot algorithm could be designed to obtain opti-
(1998). Biological Sequence Analysis: Probabilistic
mal structural alignment of pseudoknot-containing Models of Proteins and Nucleic Acids, Cambridge Uni-
RNAs. versity Press, Cambridge UK.
Eddy, S. R. & Durbin, R. (1994). RNA sequence analysis
using covariance models. Nucl. Acids Res. 22, 2079-
2088.
Methods Edmonds, J. (1965). Maximum matching and polyhe-
dron with 0, 1-vertices. J. Res. Nat. Bur. Stand. 69B,
The algorithm was implemented in ANSI C on a Sili-
con Graphics Origin200. The algorithm has a theoretical 125-130.
worst-case complexity of O(N6) in time and O(N4) in sto- FerreÂ-D'AmareÂ, A. R., Zhou, K. & Doudna, J. A. (1998).
rage. At its present stage, the program is empirically Crystal structure of a hepatitis delta virus ribozyme.
observed to run O(N6.8) in time and O(N3.8) in memory. Nature, 395, 567-574.
For instance, a tRNA of 75 nt takes 20 minutes and uses Freier, S., Kierzek, R., Jaeger, J. A., Sugimoto, N.,
6.6 Mb of memory. The 30 -end of tobacco mosaic virus Caruthers, M. H., Neilson, T. & Turner, D. H.
has 105 nucleotides and takes 235 minutes and uses (1986). Improved free-energy parameters for predic-
22.5 Mb. The program empirically scales above the tions of RNA duplex stability. Proc. Natl Acad. Sci.
theoretical complexity in time of the algorithm. This USA, 83, 9373-9377.
effect seems to have to do with the way the machine Gabow, H. N. (1976). An ef®cient implementation of
allocates memory for larger RNAs. The software and Edmonds' algorithm for maximum matching on
parameter sets are available by request from E. Rivas graphs. J. Asc. Com. Mach. 23, 221-234.
(elena@genetics.wustl.edu). A technical report giving the Gluick, T. C. & Draper, D. E. (1994). Thermodynamics
full algorithm is available from http://www.genetics.- of folding a pseudoknotted mRNA fragment. J. Mol.
wustl.edu/eddy/publications/. Biol. 241, 246-262.
Guilley, H., Jonard, G., Kukla, B. & Richards, K. E.
(1979). Sequence of 1000 nucleotides at the 30 end of
tobacco mosaic virus RNA. Nucl. Acids Res. 6, 1287-
1308.
Gultyaev, A. P., van Batenburg, F. H. & Pleij, C. W. A.
(1995). The computer simulation of RNA folding
pathways using a genetic algorithm. J. Mol. Biol.
Acknowledgments 250, 37-51.
Huynen, M., Gutell, R. & Konings, D. (1997). Assessing
This work was supported by NIH grant HG01363 and
the reliability of RNA folding using statistical mech-
by a gift from Eli Lilly. E.R. acknowledges the support of
a fellowship by the Sloan Foundation. The idea for the anics. J. Mol. Biol. 267, 1104-1112.
algorithm came from a discussion with Gary Stormo at a Kolk, M. H., van der Graff, M., Wijmenga, S. S., Pleij,
meeting at the Aspen Center for Physics. Tim Hubbard C. W. A., Heus, H. A. & Hilbers, C. W. (1998).
suggested parallel b-strands in proteins as an example of NMR structure of a classical pseudoknot: interplay
a set of pairwise interactions that the algorithm cannot of single- and double-stranded RNA. Science, 280,
handle. We wish to thank the anonymous reviewers for 434-438.
very useful comments. Lefebvre, F. (1996). A grammar-based uni®cation of
several alignments and folding algorithms. ISMB-
96 (Rawlings, C., et al., eds), pp. 143-154, AAAI
Press.
References Mathews, D. H., Andre, T. C., Kim, J., Turner, D. H. &
Zuker, M. (1998). An updated recursive algorithm
Abrahams, J. P., van der Berg, M., van Batenburg, E. & for RNA secondary structure prediction with
Pleij, C. W. A. (1990). Prediction of RNA secondary improved free energy parameters. In Molecular Mod-
structure, including pseudoknotting, by computer eling of Nucleic Acids (Leontis, N. B. & SantaLucia,
simulation. Nucl. Acids Res. 18, 3035-3044. J., Jr, eds), American Chemical Society.
Bjorken, J. D. & Drell, S. D. (1965). Relativistic Quantum McCaskill, J. S. (1990). The equilibrium partition func-
Fields, McGraw-Hill, New York, NY. tion and base pair bindings probabilities for
Bonhoeffer, S., McCaskill, J. S., Stadler, P. F. & Schuster, RNA secondary structure. Biopolymers, 29, 1105-
P. (1993). Statistics of RNA secondary structure. 1119.
Eur. Biophys. J. (EHU), 22, 13-24. Notredame, C., O'Brien, E. A. & Higgins, D. G. (1997).
Brown, M. (1996). RNA pseudoknot modeling using RAGA: RNA sequence alignment by genetic algor-
intersections of stochastic context free grammars ithm. Nucl. Acids Res. 25, 4570-4580.
with applications to database search. Paci®c Sym- Nussinov, R., Pieczenik, G., Griggs, J. R. & Kleitman,
posium on Biocomputing 1996. D. J. (1978). Algorithms for loop matchings. SIAM J.
Cary, R. B. & Stormo, G. D. (1995). Graph-theoretic Appl. Math. 35, 68-82.
approach to RNA modeling using comparative Pleij, C. W., Rietveld, K. & Bosch, L. (1985). A new prin-
data. In ISMB-95 (Rawling, C., et al., eds), pp. 75-80, ciple of RNA folding based on pseudoknotting.
AAAI Press. Nucl. Acids Res. 13, 1717-1731.
Dumas, P., Moras, D., Florentz, C., GiegeÂ, R., Sakakibara, Y., Brown, M., Hughey, R., Mian, I. S.,
Verlaan, P., van Belkum, A. & Pleij, C. W. A. SjoÈlander, K., Underwood, R. C. & Haussler, D.
RNA Pseudoknot Prediction by Dynamic Programming 2065
(1994). Stochastic context-free grammars for tRNA Zuker, M. & Sankoff, D. (1984). RNA secondary struc-
modeling. Nucl. Acids Res. 22, 5112-5120. ture and their prediction. Bull. Math. Biol. 46,
Sankoff, D. (1985). Simultaneous solution of the RNA 591-621.
folding, alignment and protosequence problems. Zuker, M. & Stiegler, P. (1981). Optimal computer fold-
SIAM J. Appl. Math. 45, 810-825. ing of large RNA sequences using thermodynamics
Schuster, P., Fontana, W., Stadler, P. F. & Hofacker, I. L. and auxiliary information. Nucl. Acids Res. 9,
(1994). From sequences to shapes and back: a case 133-148.
study in RNA secondary structure. Proc. Roy. Soc.
ser. B, 255, 279-284.
Schuster, P., Fontana, W., Stadler, P. F. & Renner, A.
(1997). RNA structures and folding: from conven- Appendix: Recursions for the Gap
tional to new issues in structure predictions. Curr. Matrices in the Pseudoknot Algorithm
Opin. Struct. Biol. 7, 229-235.
Serra, M. J. & Turner, D. H. (1995). Predicting the ther- Here we provide simpli®ed recursion relations
modynamic properties of RNA. Methods Enzymol. for the gap matrices used in the pseudoknot algor-
259, 242-261. ithm, without including dangling and coaxial dia-
Steinberg, S., Misch, A. & Sprinzl, M. (1993). Compi- grams. (As before, contiguous nucleotides are
lation of RNA sequences and sequences of tRNA given explicit dots in the diagrams.)
genes. Nucl. Acids Res. 21, 3011-3015. The recursion for the vhx matrix in the pseudo-
Tabaska, J. E. & Stormo, G. D. (1997). Automated align- knot algorithm is given by (Figure A1):
ment of RNA sequences to pseudoknotted struc-
tures. ISMB-97, 5, 311-318. vhx
i; j : k; l optimal
Tabaska, J. E., Cary, R. B., Gabow, H. N. & Stormo, 8
G. D. (1998). An RNA folding method capable of >
> g2
i; j : k; l
EIS
identifying pseudoknots and base triples. Bioinfor- >
>
>
>
matics, 8, 691-699. < EIS
g2
i; j : r; s vhx
r; s : k; l
ten Dam, E., Pleij, K. & Draper, D. (1992). Structural and
1A
functional aspects of RNA pseudoknots. Biochemis-
>
> g2
r; s : k; l vhx
i; j : r; s
EIS
>
>
try, 31, 11665-11676. >
>
:
Tuerk, C., MacDougal, S. & Gold, L. (1992). RNA pseu- 2 P~ M
e whx
i 1; j ÿ 1 : k ÿ 1; l 1
doknots that inhibit human immunode®ciency virus
type 1 reverse transcriptase. Proc. Natl Acad. Sci. 8i; r; k; l; s; j i 4 r 4 k 4 l 4 s 4 j
USA, 89, 6988-6992.
van Batenburg, F. H. D., Gultyaev, A. P. & Pleij, C. W. A. Here P~ is the score for creating a pair in a pseudo-
(1995). An APL-programmed genetic algorithm for e corresponds to the score given to a
knot, and Ms;
the prediction of RNA secondary structure. J. Theor. non-nested multiloop. P~ and M e could be equal to P
Biol. 174, 269-280. and M, the score for a pair in a nested structure
Van Belkum, A., Abrahams, J. P., Pleij, C. W. A. & and the score assigned to nested multiloops
Bosch, L. (1985). Five pseudoknots are present at respectively, but it does not have to be. Similarly,
the 204 nucleotides long 30 non coding region of g2 ;
the score for an irreducible surface of O(2), EIS
tobacco mosaic virus RNA. Nucl. Acids Res. 13,
could be the same as the score given for nested
7673-7686.
Van Belkum, A., Bingkun, J., Pleij, C. W. A. & Bosch, L. structures, EIS2, but again, it does not have to be.
(1987). Structural similarities among valine-accept- We found the best ®ts by giving them values
ing tRNA-like structures in tymoviral RNAs and different from those used for nested foldings (cf.
elongator tRNAs. Biochemistry, 26, 1144-1151. Tables 2 and 3).
Walter, A., Turner, D., Kim, J., Lyttle, M., MuÈller, P.,
Mathews, D. & Zuker, M. (1994). Coaxial stacking
of helixes enhances binding of oligoribonucleotides
and improves predictions of RNA folding. Proc.
Natl Acad. Sci. USA, 91, 9218-9222.
Woese, C. R. & Pace, N. R. (1993). Probing RNA struc-
ture, function, and history by comparative analysis.
The RNA World (Gesteland, R. F. & Atkins, J. F.,
eds), pp. 91-117, Cold Spring Harbor Laboratory
Press, Cold Spring Harbor, NY.
Wyatt, J. R., Puglisi, J. D. & Tinoco, I., Jr (1990). RNA
pseudoknots: stability and loop size requirements.
J. Mol Biol. 214, 455-470.
Zuker, M. (1989a). Computer prediction of RNA struc-
ture. Methods Enzymol. 180, 262-288.
Zuker, M. (1989b). On ®nding all suboptimal foldings of
an RNA molecule. Science, 244, 48-52.
Zuker, M. (1995). ``Well-determined'' regions in RNA
secondary structure prediction: analysis of small
subunit ribosomal RNA. Nucl. Acids Res. 23, 2791-
2798. Figure A1. Recursion for the vhx matrix.
2066 RNA Pseudoknot Prediction by Dynamic Programming
Figure A2. Recursion for the zhx matrix. Figure A3. Recursion for the yhx matrix.
The recursions for the gap matrices zhx and yhx in the pseudoknot algorithm are complementary and
given by (cf. Figures A2 and A3):
Finally, the recursion for the gap matrix whx appears in Figure A4, and is given by:
Figure A4. Recursion for the whx
matrix.
.
2068 RNA Pseudoknot Prediction by Dynamic Programming
8i; k; j 14i4k4j4N
http://www.academicpress.com/jmb