You are on page 1of 15

Protein Repeats: Structures, Functions, and Evolution

Mi guel A. Andrade,*
,
Carol i na Perez-I ratxeta,*
,
and Chri s P. Ponti ng
*European Molecular Biology Laboratory, Meyerhofstr. 1, Heidelberg 69012, Germany; Department of Bioinformatics, Max Delbruck
Center for Molecular Medicine, Berlin-Buch 13092, Germany; and MRC Functional Genetics Unit, Department of Human Anatomy
and Genetics, University of Oxford, South Parks Road, Oxford OX1 3QX, United Kingdom
Recei ved March 13, 2001, and i n revi sed form June 11, 2001; publ i shed onl i ne August 2, 2001
Internal repetition within proteins has been a
successful strategem on multiple separate occa-
sions throughout evolution. Such protein repeats
possessregular secondary structuresandformmul-
tirepeat assemblies in three dimensions of diverse
sizes and functions. In general, however, internal
repetition affords a protein enhanced evolutionary
prospects due to an enlargement of its available
bindingsurfacearea. Constraints on sequencecon-
servation appear toberelatively lax, duetobinding
functionsensuingfrommultiple, rather than, single
repeats. Considerable sequence divergence as well
as the short lengths of sequence repeats mean that
repeat detection can beaparticularly arduoustask.
We also consider the conundrum of how multiple
repeats, which show strong structural and func-
tional interdependencies, ever evolved from a sin-
gle repeat ancestor. In this review, we illustrate
each of these points by referring to six prolic re-
peat types (repeats in -propellers and -trefoils
and tetratricopeptide, ankyrin, armadillo/HEAT,
and leucine-rich repeats) and in other less-prolic
but nonetheless interesting repeats. 2001 Academic
Press
INTRODUCTION
Past i nnovati on i n protei n functi ons and struc-
tures i s due, for the most part, to gene dupl i cati on
(Ohno, 1970). Dupl i cati on and recombi nati on wi thi n
a si ngl e gene have often gi ven ri se to non-overl ap-
pi ng regi ons of a protei n sequence that share si gni f-
i cant sequence si mi l ari ty. Such repeats are rel a-
ti vel y common, occurri ng i n at l east 14% of al l
protei ns (Marcotte et al., 1999). Repeats vary con-
si derabl y from short ami no aci d repeti ti ons, for ex-
ampl e, the pol ygl utami ne tracts of the Hunti ngton
di sease gene product hunti ngti n, to l arge repeti ti ons
contai ni ng mul ti pl e domai ns, such as i n the cy-
toskel etal protei n ti ti n.
I n thi s revi ew, we concentrate on sequence re-
peats that occur tandeml y i n sequence and that form
i ntegrated assembl i es when vi ewed as three-di men-
si onal structures. Such repeats are essenti al l y de-
ned by thei r mul ti pl i ci ty and thus di ffer from both
domai ns and moti fs si nce these can occur si ngl y. The
i mportance of repeats i n understandi ng bi ol ogi cal
functi on resi des not onl y i n thei r hi gh frequency
among known sequences, but al so i n thei r abi l i ti es
to confer mul ti pl e bi ndi ng and structural rol es on
protei ns. Thi s functi onal versati l i ty i s apparent not
onl y among di fferent repeat types, but al so for si m-
i l ar repeats from the same fami l y.
Our understandi ng of repeats, wi th respect to
thei r structures, functi ons, and evol uti on, therefore
represents a consi derabl e chal l enge. How are we
abl e to predi ct repeats wi thi n protei n sequences?
What are the rel ati onshi ps between repeats and
thei r functi ons? I n thi s revi ew we descri be si x major
repeat cl asses and thei r functi ons, structures, and
possi bl e evol uti onary mechani sms. We attempt to
descri be how repeat i denti cati on can be l i nked to
enhanced bi ol ogi cal knowl edge.
EVOLUTION OF REPEATS
Repeats are thought to ari se vi a i ntrageni c dupl i -
cati on and recombi nati on events. Sel ecti ve advan-
tage of mul ti pl e repeats resul ts i n these mutati ons
bei ng xed among popul ati ons. Wi th the benet of
hi ndsi ght and the l arge numbers of sequences
known, i t i s cl ear that repeti ti ons of smal l structural
uni ts mi ght confer several advantages on protei ns,
and thereon to thei r organi sms, that are di sti nct
from those of repeated domai ns. For exampl e, tan-
deml y repeated structures often occur i n regul ar
arrangements, ei ther i n l i near arrays (e.g., see I afp
i n Fi g. 1) or as a superhel i x wi th repeats arranged
about a common axi s (e.g., see HEAT in Fi g. 1). For
such open structures there i s no theoreti cal l i mi t
on thei r repeat number, si nce i ncremental addi ti on
of repeats i s not steri cal l y i mpeded. These rod-l i ke
or superhel i cal structures present an extensi ve sol -
Journal of Structural Bi ol ogy 134, 117131 (2001)
doi :10.1006/jsbi .2001.4392, avai l abl e onl i ne at http://www.i deal i brary.com on
117 1047-8477/01 $35.00
Copyri ght 2001 by Academi c Press
Al l ri ghts of reproducti on i n any form reserved.
118 ANDRADE, PEREZ-I RATXETA, AND PONTI NG
vent-accessi bl e surface that i s wel l sui ted to bi ndi ng
l arge substrates such as protei ns and nucl ei c aci ds.
By contrast, dupl i cati on of repeats i n a superhel i x
wi th a smal l pi tch resul ts i n a cl osed barrel -l i ke
structure, wi th a rel ati vel y smal l surface area avai l -
abl e for l i gand i nteracti ons wi th smal l er l i gands
(e.g., see Kelch i n Fi g. 1). These assembl i es are l i kel y
to present di fferent advantages than the open struc-
tures of rods and superhel i ces. They are compact
and stabl e, wi th opportuni ti es for smal l l i gands to be
bound ei ther al ong the i nternal axi s of the barrel or
on the axi s at the barrel s peri phery.
Fol l owi ng xati on of a repeat dupl i cati on, se-
quence si mi l ari ti es among repeats may erode
qui ckl y. Thus equi val ent HEAT repeats i n i nverte-
brate and mammal i an orthol ogues average onl y 13%
sequence i denti ty (Andrade et al., 2001). These
sl i ght si mi l ari ti es i mpl y that the functi onal con-
strai nts on i ndi vi dual repeats are rel ati vel y weak,
when compared to the constrai nts i mposed on the
repeat assembl y as a whol e. By contrast, a functi on
that i s exacti ng on the structure of repeats, such as
those i n the i ce-bi ndi ng -sheet domai n of i nsect
anti freeze protei ns (Li ou et al., 2000), resul ts i n
repeats bei ng hi ghl y si mi l ar i n sequence.
The numbers of repeats can vary even between
orthol ogues, i ndi cati ng that rapi d l oss and/or gai n of
repeats occurs frequentl y i n evol uti on. Thi s i s neatl y
underscored by the demonstrati on that di fferent al -
l el es of a protei n from the fungus Podospora anse-
rina possess di fferent numbers of WD40 repeats
(Saupe et al., 1995).
As we di scuss bel ow when descri bi ng major repeat
cl asses, the most common functi on of repeat ensem-
bl es i s that of bi ndi ng to protei ns. Such a property
provi des opportuni ti es for the organi sm to expand
i ts repertoi re of cel l ul ar functi ons, such as protei n
transport, protei n-compl ex assembl y, and protei n
regul ati on usi ng preexi sti ng geneti c materi al . Ac-
cordi ngl y, even though the abi l i ty to generate re-
peats appears to be a general phenomenon of al l
phyl a, repeats are more common i n eukaryoti c or-
gani sms than i n prokaryoti c ones (Marcotte et al.,
1999) and i n metazoans more than i n the rest of the
eukaryotes (see Tabl e I ). Thi s may be associ ated
wi th the i ncreasi ng compl exi ty of cel l ul ar functi ons
that are readi l y avai l abl e from assembl i es of re-
peats.
DETECTION OF REPEATS
I denti fyi ng tandem repeats wi th hi gh sequence
si mi l ari ti es i s rel ati vel y strai ghtforward. Detecti ng
homol ogous repeats whose si mi l ari ti es are l ow, how-
ever, represents a more consi derabl e chal l enge.
Compoundi ng thi s i s the i ssue of deni ng the bound-
ari es of repeats. I n some cases repeat boundari es
may be assi gned from the posi ti ons of anki ng do-
mai ns or repeats or from bona de protei n termi ni .
Frequentl y the boundari es are predi cted si mpl y
from an expectati on that repeats occur i n i nteger
mul ti pl es and that homol ogues repeat boundari es
are al ways coi nci dent.
Unfortunatel y repeats can occur i n noni nteger
mul ti pl es and thei r boundari es often do not coi nci de.
For exampl e, arrays of bi hel i cal repeats may consi st
of an i nteger number of hel i ces 1-2, wi th a si ngl e
addi ti onal anki ng hel i x (hel i x 2 at the N-termi nus
or hel i x 1 at the C-termi nus) representi ng a hal f-
repeat. Repeats i n cl osed -propel l er barrel struc-
tures do occur onl y i n i nteger mul ti pl es but often do
not exactl y correspond to the repeats seen i n struc-
ture. Thi s i s due to the ci rcul ar permutati on of the
sequence repeats wi th respect to the structure re-
peats.
FIG. 1. Terti ary structures of several protei ns wi th structural repeats. Al ternati ng repeats are shown i n di fferent col ours. Kel ch i s the
gal actose oxi dase from D. dendroides (I to et al., 1991) and Fgf i s the aci di c brobl ast growth factor from H. sapiens (Eri ksson et al., 1993);
these are exampl es of di fferent cl osed structures, a -barrel and a -trefoi l , repecti vel y. TPR i s a fragment of the human protei n
phosphatase 5 (Das et al., 1998). HEAT i s the protei n phosphatase 2A PR65/A from H. sapiens, whi ch i s an open sol enoi d-l i ke structure
(see text) (Groves et al., 1999). LRR i s the porci ne ri bonucl ease i nhi bi tor compl exed wi th the ri bonucl ease (Kobe and Dei senhofer, 1995).
Fi b corresponds to the adenovi rus bre protei n from the human adenovi rus type 2 (van Rai i j et al., 1999); the two vi ews of the structure
show a tri pl e spi ral (Tabl e I I I ). I afp i s the i nsect anti freeze protei n from Tenebriomolitor (Li ou et al., 2000), a smal l al l protei n (Tabl e
I I I ). ANK i s a fragment of the of the -subuni t of the of the GA-bi ndi ng protei n from mouse (Batchel or et al., 1998) compl exed wi th the
-subuni t and 21 bp of DNA. The correspondi ng PDB i denti ers are Kel ch, 1gof. Fgf, chai n A from 2afg. TPR, 1a17. HEAT, chai n A
from1b3u. LRR, chai n I from 1dfj. Fi b, chai n A from 1qi u, I afp, chai n A from 1ezg. ANK, chai n B from 1awc.
TABLE I
The Numbers and Percentages of Protei ns That Are
Annotated by the Swi ssProt Database (Bai roch and
Apwei l er, 2000) wi th the Feature Repeat, Sorted by
Taxon
Taxon
Number contai ni ng
repeats/total Percentage
Archaea 27/3428 0.79
Vi ruses 81/8048 1.00
Bacteri a 299/28438 1.05
Fungi 232/8334 2.78
Vi ri di pl antae 153/6963 2.20
Metazoa 1538/28948 5.31
Rest of Eukaryota 92/2434 3.78
119 PROTEI N REPEATS: STRUCTURES, FUNCTI ONS, AND EVOLUTI ON
Never thel ess, r epeat detecti on has become con-
si der abl y easi er i n r ecent year s due to the advent
of Web-based r esour ces, such as SMART (smar t.
embl -hei del ber g.de; Schul tz et al., 1998) and Pfam
(www.sanger .ac.uk/Pfam; Bateman et al., 2000),
both of whi ch perform wel l i n predi cti ng frequentl y
occur r i ng r epeats. A new ser ver , REP (www.
embl -hei del ber g.de/andr ade/paper s/r ep/sear ch.
html , Andrade et al., 2000), al so i s proci ent i n
detecti ng common repeats. I t i s emphasi sed that,
due to the probl ems outl i ned above, these and other
methods are unabl e to predi ct al l repeats wi th com-
pl ete accuracy.
I denti fyi ng repeti ti ve regi ons of si ngl e protei n se-
quences i nvari abl y i nvol ves the anal ysi s of subopti -
mal al i gnments. An opti mal al i gnment of a sequence
(wi th i ami no aci ds) i s the path wi th the hi ghest
associ ated al i gnment score taken through the i i
trace matri x. The rst and subsequent subopti mal
al i gnments are gi ven by the next hi ghest scori ng
paths. Hi gh-scori ng paths can be vi sual i zed usi ng
Dotter (www.cgr.ki .se/cgr/groups/sonnhammer/
Dotter.html ; Sonnhammer and Durbi n, 1995). Esti -
mati ng whether such al i gnments represent past
evol uti onary dupl i cati on events or whether the i n-
ternal sequence si mi l ari ty arose si mpl y by chance
has, unti l recentl y, been a thorny i ssue. A cl assi c
approach to esti mati ng the si gni cance of sequence
si mi l ari ty has been to compare the al i gnment score
to those generated by randoml y shufi ng the
al i gned sequences (McLachl an, 1983). Useful i mpl e-
mentati ons of thi s have recentl y been descri bed (Pel -
l egri ni et al., 1999; Heger and Hol m, 2000).
MACAW (Schul er et al., 1991) can al so be used to
assess sequence si mi l ari ty si gni cance. By contrast
to the aforementi oned methods, MACAW provi des
probabi l i ti es P that the repeats have not ari sen
through chance al one. Here the sequence must be
compared agai nst i tsel f and a search space used that
i s the square of the sequence l ength i n ami no aci ds.
Thi s method i s not enti rel y sati sfactory si nce i t i s
not amenabl e to l arge-scal e studi es l ooki ng for i n-
ternal repeats i n more than one protei n, and i t con-
si ders onl y ungapped al i gnments.
One further el egant and stati sti cal l y robust
approach, whi ch generates P val ues for subopti mal
al i gnments, i s provi ded i n the Prospero/Ari adne sui te
(www.wel l .ox.ac.uk/rmott/ari adne.html ; Mott and
Tri be, 1999; Mott, 2000). Thi s method accounts for
vari ati ons i n sequence composi ti on and l ength i n i ts
deri vati on of P for gapped al i gnments and thus shoul d
be the method of choi ce i n assessi ng the si gni cance of
i nternal sequence si mi l ari ti es.
The popul ar BLASTsui te of programs (Al tschul et
al., 1997), and i n parti cul ar PSI -BLAST, may al so be
used to detect repeats. I t i s emphasi zed, however,
that BLASTs stati sti cs are provi ded on the basi s of
opti mal , rather than subopti mal , al i gnments. Con-
sequentl y, these stati sti cs are not abl e to provi de
good esti mates of ei ther P or E, the number of pro-
tei ns wi th associ ated (opti mal ) al i gnment scores
greater than, or equal to, a score x expected purel y
by chance. The presence of repeats i n a sequence
used as a query i n PSI -BLAST runs i s i ndi cated
usual l y by: (1) the same regi on of the query bei ng
al i gned agai nst two di sti nct regi ons of a second pro-
tei n wi th an associ ated E val ue l ess than about 10 or
(2) different regi ons of the query bei ng al i gned
agai nst the same regi on of a second protei n, agai n
wi th E 10.
Once the presence of repeats wi th stati sti cal l y
si gni cant si mi l ari ti es i n a protei n has been estab-
l i shed, i t i s appropri ate to construct thei r mul ti pl e
al i gnment. Further repeat homol ogues, i denti ed by
(PSI -) BLAST searches of databases (wi th an E
val ue i ncl usi on threshol d E
T
0.002, for i ncl usi on
i n the prol e used i n the subsequent search i tera-
ti on) usi ng the ori gi nal repeats as queri es shoul d be
added to thi s al i gnment. The mul ti pl e al i gnment
shoul d be opti mi zed by hand edi ti ng fol l owi ng gui de-
l i nes gi ven el sewhere (Bork and Gi bson, 1996; Pon-
ti ng and Bi rney, 2000). From thi s al i gnment, a hi d-
den Markov Model (HMM) may be constructed and
compared wi th protei n sequence databases usi ng,
for exampl e, the HMMER sui te (hmmer.wustl .edu;
Eddy, 1998). HMMER i s appropri ate for col l ati ng
protei n repeats si nce i t successful l y appl i es a heu-
ri sti c strategy to detect bona de repeats whose
i ndi vi dual E val ues (for opti mal al i gnment stati s-
ti cs) appear to be i nsi gni cant, but are deemed si g-
ni cant by combi ni ng the hi ghest scores of other
repeats i n the protei n. Repeats shoul d be consi dered
si gni cant i f thei r (per-sequence, rather than per-
repeat) E val ues are l ess than 0.1.
Detection Example: New Repeats in Spindlin
As an exampl e of detecti ng repeats, we descri be
an anal ysi s of spi ndl i n, a spi ndl e-associ ated protei n
wi th rol es i n earl y mouse embryo devel opment (Oh
et al., 1997). Repeats were detectabl e wi thi n spi n-
dl i n usi ng one or more of four methods. Fi rst, com-
pari son of thi s sequence wi th i tsel f usi ng Dotter
(Sonnhammer and Durbi n, 1995) showed si mi l ari ty
not onl y al ong the di agonal (whi ch represents an
exact match of the sequence wi th i tsel f) but al so i n
off-di agonal posi ti ons (whi ch represent si mi l ar, but
noni denti cal , regi ons) (Fi g. 2a). Thi s suggests, but
does not provi de stati sti cal evi dence for, i nternal
repeats wi thi n spi ndl i n. Second, a gapped BLAST
(Al tschul et al., 1997) search of NCBI s nonredun-
dant database usi ng the Mus musculus spi ndl i n se-
quence as a query reveal ed si gni cant si mi l ari ty to,
120 ANDRADE, PEREZ-I RATXETA, AND PONTI NG
among others, i ts orthol ogue i n Mus spicilegus. The
si gni cant si mi l ari ty agai n resi ded not onl y al ong
the di agonal (E 6 10
63
) but al so i n an off-
di agonal second al i gnment (E 1 10
5
). Thi rd,
sel f-compari son of the spi ndl i n sequence usi ng Pros-
pero (Mott and Tri be, 1999) showed two off-di agonal
regi ons of si gni cant si mi l ari ty (P 1.1 10
13
and 6.0 10
5
). Last, a sel f-compari son of spi ndl i n
usi ng MACAW (Schul er et al., 1991) reveal ed three
pai rs of ungapped al i gnment bl ocks wi th si gni cant
(6.9 10
9
, 5.2 10
5
and 6.2 10
3
) si mi l ari ti es
(here, the rel evant search space i s the square of the
number of ami no aci ds i n spi ndl i n, 240
2
).
Once stati sti cal si gni cance of repeats was as-
sured thei r sequences were mul ti pl y al i gned (Fi g.
2b). For thi s, the boundari es of repeats needed to be
assi gned. I n the case of the three spi ndl i n repeats
thi s was not parti cul arl y probl emati c si nce these
together span the compl ete protei n sequence. Thus,
the N-termi nal repeat boundary coi nci des wi th the
protei ns N-termi nus and the C-termi nal boundary
coi nci des wi th the protei ns C-termi nus. For the
sake of compl eteness, a HMM constructed from the
spi ndl i n repeats mul ti pl e al i gnment was compared
wi th current protei n sequence databases usi ng
HMMER (Eddy, 1998), but no further homol ogues
were detected. The spi ndl i n repeats appear to be al l
-strand structures, but thei r functi ons remai n un-
known.
SIX MAJ OR REPEAT FAMILIES
Many protei n repeat fami l i es are known, each
wi th di fferent structures, functi ons, and phyl oge-
neti c di stri buti ons. For the purpose of thi s revi ew,
we have chosen to cl assi fy fami l i es accordi ng to thei r
terti ary structures, al though other ways of cl assi -
cati on are of equal meri t. The si x repeat fami l i es we
shal l di scuss (Tabl e I I ) i ncl ude two fami l i es each of
the three major structural types: al l - (-propel l ers
and -trefoi l s), al l structure (armadi l l o/HEATand
TPR-l i ke repeats), and mi xed / (l euci ne-ri ch and
ankyri n repeats). These exampl es provi de ampl e ev-
i dence for the evol uti onary mechani sms of thei r
propagati on.
-Propellers
The WD40 repeat (Neer et al., 1994) i s the most
common repeat detected among known human pro-
tei ns. These contai n approxi matel y 40 ami no aci ds
and i ncl ude wel l -conserved Trp (W) and Asp (D)
ami no aci ds. The crystal structure of an assembl y of
seven WD40 repeats (e.g., Sondek et al., 1996) re-
veal ed that each repeat represents a four-stranded
anti paral l el -sheet (a bl ade) arranged radi al l y i n
a propel l er arrangement about a central axi s. Such
-propel l er structures are al so seen i n methyl ami ne
dehydrogenase heavy chai n (PQQ repeats), regul a-
tor of chromosome condensati on 1 (RCC1 repeats),
FIG. 2. Detecti on of repeats i n spi ndl i n. (a) Dot pl ot of spi ndl i n (SPI N_MOUSE (hori zontal ) vs SPI N_MOUSE (verti cal ). (b) Mul ti pl e
al i gnment of repeats i n spi ndl i n.
121 PROTEI N REPEATS: STRUCTURES, FUNCTI ONS, AND EVOLUTI ON
and gal actose oxi dase (Kel ch repeats) (each contai n-
i ng seven bl ades) and i n neurami ni dase (contai ni ng
si x bl ades) (revi ewed i n Murzi n, 1992).
I n recent years several fami l i es of domai ns have
been shown to adopt -propel l er structures wi th
four, ve, si x, seven, or ei ght bl ades. These struc-
tures may be browsed usi ng the SCOP resource
(scop.mrc-l mb.cam.ac.uk/scop/; Lo Conte et al.,
2000). Several other fami l i es of repeats have al so
been predi cted to adopt a propel l er-l i ke structure,
for exampl e, YWTD (Spri nger, 1998) and i ntegri n
subuni ts (Spri nger, 1997).
-Propel l er structures are cl osed structures wi th
i nteracti ons between the N- and C-termi nal repeats.
As descri bed previ ousl y, the peri odi ci ti es of some
-propel l er repeats do not exactl y match the peri odi -
ci ti es of thei r repeats structures. I n these cases the
sequence repeat i s ci rcul arl y permuted wi th respect
to the structural repeat. A Vel cro model of cl osure
of propel l ers has been proposed (Neer and Smi th,
1996), wi th one of the bl ades bei ng formed from
-strands from both the most N-termi nal and the
most C-termi nal of sequence repeats.
Repeat fami l i es commonl y represent ei ther en-
zymes or nonenzymes, but rarel y both. I t i s unusual
therefore that some -propel l ers are enzymes,
whereas others are not. Whether catal yti c or not,
-propel l ers have a si gni cant preference for bi nd-
i ng protei ns and other l i gands al ong the propel l er
axi s at the surface formed by the N-termi ni of i nte-
ri or -strands (Russel l et al., 1998). Thi s observati on
of a l i gand-bi ndi ng supersi te i n -propel l ers was
recentl y used to predi ct resi dues that contri bute to
the l i gand-bi ndi ng si te of Tol B (Ponti ng and Pal l en,
1999a). Not onl y was the predi cti on (Ponti ng and
Pal l en, 1999a) of a -propel l er domai n i n Tol B cor-
rect (Abergel et al., 1999) but al so the predi cted
l i gand-bi ndi ng resi dues (i n the l oops between
-strands 2 and 3, and 4 and 1, Fi g. 3) were found to
correl ate wi th experi mental l y deri ved functi onal
resi dues (Ray et al., 2000) (Fi g. 3). Thi s demon-
strates that supersi te i nformati on can be used to
predi ct bi ndi ng-si tes even i n the absence of terti ary
structure data.
Recent studi es i ndi cate that the -propel l ers of
mul ti domai n proteases may di rectl y sel ect sub-
strates by si ze excl usi on. The crystal structure of the
prol yl ol i gopepti dase -propel l er domai n shows that
i t l acks the usual Vel cro of a bl ade formed by N-
and C-termi nal -strands (Ful op et al., 1998). I n-
stead, the termi nal bl ades associ ate onl y vi a hydro-
phobi c i nteracti ons. The enzymes acti ve si te, whi ch
cl eaves substrates no l onger than 30 ami no aci ds,
faces the narrow (4 ) entrance of the propel l er. I t
i s proposed that thi s entrance i s enl arged by the
breathi ng of the propel l er between the rst and
l ast bl ades. The si ze of the enl arged entrance i s
thought to act to excl ude l arge substrates, thereby
preferenti al l y speci fyi ng the smal l (30 ami no aci d)
pol ypepti de substrates. By anal ogy, a si mi l ar mech-
ani sm has been proposed for the -propel l er domai n
of the tri corn protease (Ponti ng and Pal l en, 1999b).
-Trefoils
Another al l -sheet cl osed structure wi th i nter-
nal repeats i s the -trefoi l . Thi s fol d i s found i n
known terti ary structures of brobl ast growth fac-
FIG. 3. Predi cti on of a supersi te i n Tol B. Sequence anal ysi s i ndi cated the presence of a -propel l er domai n i n Tol B (Ponti ng and
Pal l en, 1999a). On the basi s of supersi te i nformati on (Russel l et al., 1998), the bi ndi ng si te of Tol B was mapped from other -propel l er
heterodi mer structures onto the mul ti pl e al i gnment (al i gnment posi ti ons marked wi th asteri sks). Thi s predi cti on corresponds wel l wi th
several ami no aci ds i nvol ved i n suppressor mutati ons of pal A88V (Ray et al., 2000).
122 ANDRADE, PEREZ-I RATXETA, AND PONTI NG
tors (FGFs), i nterl euki n-1s, Kuni tz soybean trypsi n
i nhi bi tors, ri ci n-l i ke toxi ns, pl ant aggl uti ni ns, and
hi sactophi l i n-l i ke acti n-bundl i ng protei ns (Murzi n
et al., 1992; Ponti ng and Russel l , 2000). By contrast
to the -propel l ers, however, -trefoi l s do not appear
to possess a supersi te si nce members of the fol d
fami l y often bi nd thei r respecti ve protei n l i gands i n
di fferent topol ogi cal l ocati ons (Russel l et al., 1998).
Consequentl y, predi cti ons of bi ndi ng si tes, such as
those descri bed above for -propel l ers, are not pl au-
si bl e.
A recent study of -trefoi l structures and se-
quences (Ponti ng and Russel l , 2000) provi des i n-
si ghts i nto the evol uti on of cl osed repeat assembl i es.
The -trefoi l fol d consi sts of si x two-stranded -hai r-
pi ns, three of whi ch form a barrel structure, whi l e
the remai ni ng three form a tri angul ar cap on the
barrel (Murzi n et al., 1992). Three pai rs of these
two-stranded -hai rpi ns can be seen as repeats i n
the crystal structures, but are not i mmedi atel y ap-
parent from thei r sequences. The recent more de-
tai l ed anal ysi s, however, demonstrated the presence
of four -trefoi l s i n the acti n-bi ndi ng protei ns fas-
ci ns and showed that the i nternal tri pl i cati ons
wi thi n each of the -trefoi l s are si gni cantl y si mi l ar
i n sequence.
Thi s i ndi cates that the three i nternal repeats i n
fasci n -trefoi l s arose not vi a convergent evol uti on
but i nstead by di vergence from a si ngl e repeat com-
mon ancestor. As a protei n possessi ng onl y a si ngl e
repeat i s unl i kel y to be stabl e as a monomer, per-
haps the most parsi moni ous expl anati on for the evo-
l uti on of the -trefoi l tri pl i cated repeat i s that a
homotri mer-formi ng progeni tor repeat underwent
successi ve gene dupl i cati on events gi vi ng ri se to a
three-repeat-contai ni ng monomer. We return to thi s
i ssue at the end of thi s revi ew.
TPR-Like
Tetratri copepti de repeats contai n approxi matel y
34 ami no aci ds arranged i n two -hel i ces that are
packed together i n a knobs-i n-hol es manner (Si kor-
ski et al., 1990; Lamb et al., 1995). Convergent evo-
l uti on of TPRs i s unl i kel y gi ven i ts rel ati vel y strong
conservati on of sequence. The TPR i s l i kel y to be an
anci ent repeat si nce i t i s found i n eukarya, bacteri a,
and archaea (Ponti ng et al., 1999). Mul ti pl e TPRs
form a ri ght-handed superhel i x (Das et al., 1998)
wi th a groove of l arge surface area avai l abl e for
l i gand bi ndi ng. Thi s groove i s empl oyed i n the bi nd-
i ng of mol ecul ar chaperone Hsp70s C-termi nal tai l
(Scheuer et al., 2000). By contrast the groove i s not
used for mol ecul ar recogni ti on by the TPRs of
p67
phox
(Lapouge et al., 2000). Thus TPR assembl i es
show mul ti pl e modes of l i gand bi ndi ng and do not
appear to possess a si ngl e supersi te.
TPRs come i n many di fferent avours that form
di sti nct sequence subfami l i es. These i ncl ude repeats
i n: ki nesi n l i ght chai ns (Gi nhart and Gol dstei n,
1996), SNAP secretory protei ns (Ordway et al.,
1994), cl athri n heavy chai ns, and bacteri al aspartyl -
phosphate phosphatases (Andrade et al., 2000). I n-
depth studi es of hel i cal repeats (Andrade et al.,
2000; Ponti ng, 2000) al so show that repeat fami l i es,
such as HAT repeats (Preker and Kel l er, 1998), pro-
tei n farnesyl transferase -subuni t repeats (Boguski
et al., 1992), and Sel -1 repeats are di stant homo-
l ogues of TPRs. These sequence-based studi es i ndi -
cate that the characteri sti c bi hel i cal TPR has prol i f-
erated as a resul t of i ts abi l i ty to acqui re mul ti pl e
functi onal rol es. However, the predi cti on of these
di fferent rol es sol el y on the basi s of sequence cur-
rentl y remai ns el usi ve.
Ankyrin
These repeats take thei r name from one of the
protei ns i n whi ch they were rst found, the human
erythrocyte protei n ankyri n (Lux et al., 1990). Each
repeat contai ns approxi matel y 33 resi dues and
forms an L-shaped structure consi sti ng of two anti -
paral l el -hel i ces fol l owed by a hai rpi n (Gori na
and Pavl eti ch, 1996). The hai rpi ns of di fferent re-
peats pack ti ghtl y together formi ng an anti -paral l el
-sheet. Hydrophobi c resi dues i n the -hel i ces form
compl ementary nonpol ar surfaces that assembl e
formi ng an extended hel i cal bundl e. Addi ti onal hy-
drogen bonds between resi dues of adjacent repeats
contri bute to further stabi l i zati on of the ensembl e.
The smal l er si zes of the si de chai ns l i ni ng the i nner
-hel i ces, and the l eft-handed twi st of the stacki ng,
produce a characteri sti c sol vent-accessi bl e groove
(Sedgwi ck and Smerdon, 1999).
The functi on of the ankyri n repeats i s to bi nd
other protei ns but they do not bi nd a si ngl e cl ass of
protei ns. For exampl e, several structures show
ankyri n repeats compl exed wi th another protei ns
(revi ewed i n Swedgwi ck and Smerdon, 1999), such
as p53 (a nucl ear tumour suppressor), CDK6 (cel l
di vi si on protei n ki nase), and p65 (a transcri pti onal
regul ator). Other known cases are the i nteracti on
between the devel opment protei n Notch and del tex
(a cytopl asmi c protei n) (Di ederi ch et al., 1994) and
the i nteracti on between the noncatal yti c subuni t
M130 and the catal yti c subuni t PP1c of the smooth
muscl e myosi n phosphatase (Hi rano et al., 1997).
These terti ary structures of compl exes show that
al though there i s consi derabl e sequence vari ati on at
the heterodi mer i nterface, the i nteracti ons i nvol ve
the extended groove formed by the anti -paral l el
-sheet (Sedgwi ck and Smerdon, 1999). Thi s mech-
ani sm i s si mi l ar to that observed i n armadi l l o and
HEAT repeats.
123 PROTEI N REPEATS: STRUCTURES, FUNCTI ONS, AND EVOLUTI ON
Ankyri n repeats are present i n a l arge number of
protei n fami l i es, i ncl udi ng transcri pti on factors, de-
vel opment regul ators, cytoskel etal protei ns, and
toxi ns. Sequence and taxonomi c anal ysi s of these
repeats suggests that thei r phyl eti c propagati on be-
tween eukaryotes, bacteri a, and vi ruses has i n-
vol ved mul ti pl e events of hori zontal gene transfer
(Bork, 1993). For exampl e, the onl y archaeal se-
quence currentl y known to have these repeats (pos-
si bl y ve copi es) i s a Thermoplasma acidophilum
hypotheti cal sequence (SPTREMBL code Q9HLN1)
that i s more si mi l ar to other eukaryoti c sequences
than to any archaeal sequence.
Armadillo/ HEAT
Armadi l l o repeats (Pei fer et al., 1994) were rst
i denti ed i n the product of the eponymous D. mela-
nogaster segment pol ari ty gene (Ri ggl eman et al.,
1989). They were l ater found i n several eukaryoti c
protei ns, i ncl udi ng the juncti onal pl aque protei n
pl akogl obi n, -cateni n, the tumour suppressor ad-
enomatous pol yposi s col i , and the nucl ear transport
factor i mporti n-, among others.
HEATrepeats deri ve thei r name from four di verse
eukaryoti c protei ns i n whi ch they were rst i denti -
ed: hunti ngti n (i nvol ved i n Hunti ngtons di sease),
el ongati on factor 3, PR65/A subuni t of protei n phos-
phatase A, and the TOR (target of rapamyci n) (An-
drade and Bork, 1995). I t i s al so present i n i mport-
i ns 1 and 2 (wi th a Ran-bi ndi ng functi on), i n
protei ns rel ated to the cl athri n-associ ated adaptor
compl ex (Andrade and Bork, 1995), i n the mi crotu-
bul e-bi ndi ng col oni c and hepati c tumor-rel ated pro-
tei n (CTOG) fami l y (Andrade et al., 2000) and i n
many other protei ns rel ated to chromosome dynam-
i cs (Neuwal d and Hi rano, 2000).
Armadi l l o repeats consi st of three hel i ces. The
rst of these i s short (about ei ght ami no aci ds l ong)
and l i es perpendi cul ar to the other two, l onger, -he-
l i ces that pack agai nst one another. HEAT repeats
have two anti -paral l el hel i ces. The rst HEAT
hel i x has a ki nk (of vari abl e extent) that makes i t
equi val ent to both the rst and the second hel i ces of
armadi l l o repeats. The C-termi nal hel i ces of both
armadi l l o and HEAT repeats are al so superi mpos-
abl e. The paral l el stacki ng of repeat uni ts forms a
sol enoi d. Dependi ng on the structure, these sol e-
noi ds may have di fferent degrees of curvature but
al l exhi bi t a groove formed by the l ast hel i x of each
repeat. As i n ankyri ns, protei nprotei n i nteracti ons
have been seen to occur wi thi n thi s groove. The
bi ndi ng of i mporti n- by i mporti n- (Ci ngol ani et
al., 1999), Ran
GTP
by transporti n (Chook et al.,
1999), and nucl ear l ocal i zati on si gnal pepti des by
i mporti n- (Conti et al., 1998) al l exhi bi t bi ndi ng
si tes wi thi n thi s groove. However, protei n recogni -
ti on can al so occur on the opposi te end of the sol e-
noi d, as wi th the bi ndi ng of FxFG nucl eopori n re-
peats by i mporti n- (Bayl i ss et al., 2000). Further
si mi l ari ti es between Armadi l l o and HEAT repeat
fami l i es i ncl ude a seri es of conserved resi dues that
form the repeats hydrophobi c cores (Andrade et al.,
2001).
I n some cases sequence and structural features
can di sti ngui sh between di fferent vari ants of these
repeats (di scussed i n Andrade et al., 2001). For ex-
ampl e, for the HEAT repeats of the PR65/A subuni t
of protei n phosphatase A, charged resi dues i n the
l oop l i nki ng the repeats hel i ces were shown to
form a l adder of el ectrostati c i nteracti ons between
adjacent repeats (Groves et al., 1999). These are al so
present i n the HEAT repeats of el ongati on factor 3,
but not i n those of i mporti n-. A conserved aspara-
gi ne i n the l ast hel i x of armadi l l o repeats i s i nvol ved
i n protei nprotei n contacts, such as recogni ti on of
the nucl ear l ocal i zati on si gnal by i mporti n- (Conti
et al., 1998). Thi s conserved asparagi ne i s absent i n
HEAT repeats.
A common phyl ogeneti c ori gi n (homol ogy) for the
armadi l l o and HEAT repeats present i n the nucl ear
protei n transport compl ex has been proposed (Mal i k
et al., 1997; Ci ngol ani et al., 1999). Other repeat
fami l i es are known whi ch exhi bi t consi derabl e
structural si mi l ari ty to armadi l l o/HEATrepeats but
show no detectabl e sequence si mi l ari ty. These i n-
cl ude the al l hel i cal structures of VHS domai ns
(Lohi and Lehto, 1998; Mao et al., 2000) and regi ons
of phosphoi nosi ti de 3-ki nase (Wal ker et al., 1999)
and eukaryoti c i ni ti ati on factor 4G (Marcotri gi ano et
al., 2001). Wi thout addi ti onal evi dence, di vergent
and convergent evol uti on of HEAT/Armadi l l o re-
peats and these structures appear equal l y pl ausi bl e.
Leucine-Rich Repeats
Leuci ne-ri ch repeats (Kobe and Dei senhofer,
1994) (LRRs) are rel ati vel y short i n compari son to
other repeat fami l i es, wi th l engths of about 20
ami no aci ds. They are associ ated wi th an astoni sh-
i ng vari ety of functi ons, i ncl udi ng si gnal transduc-
ti on, transmembrane receptors, DNA repai r, cel l ad-
hesi on, and extracel l ul ar matri x protei ns. They are
al so not restri cted to eukaryotes, si nce bacteri al and
vi ral versi ons are known. The common functi on
among LRRs i s that they form compl exes wi th other
protei ns. For exampl e, the LRRs of ri bonucl ease A
i nhi bi tor bi nd to ri bonucl ease A (Kobe and Dei sen-
hofer, 1995), LRRs of the extracel l ul ar matri x
l euci ne-ri ch repeat gl ycoprotei n/proteogl ycan fami l y
(I ozzo, 1998) i nteract wi th transformi ng growth fac-
tor (Hi l debrand et al., 1994) and col l agen (Svenson
et al., 2000), LRRs of pl atel et gl ycoprotei ns associ ate
wi th thrombi n and von Wi l l ebrand factor (Shen et
124 ANDRADE, PEREZ-I RATXETA, AND PONTI NG
al., 2000), and LRRs of pl ant di sease resi stance gene
products form a pathogen-recogni ti on domai n (Van
Der Bi ezen and Jones, 1998).
The rst crystal structures of LRRs showed each
repeat to contai n a -strand and an -hel i x that are
ori ented i n an anti paral l el manner (Kobe and
Dei senhofer, 1995; Pri ce et al., 1998). The si de-by-
si de associ ati on of repeats bui l ds an arch, wi th the
-strands formi ng the archs i nteri or harbori ng an
extended protei n-bi ndi ng surface.
Somewhat surpri si ngl y, l ater structures were
found to be rather di fferent. I n parti cul ar, the struc-
ture of the I nternal i n B protei n from Listeria mono-
cytogenes al so shows an array of -strands, formi ng
the i nsi de surface of the arch, but i ts outsi de surface
i s composed of 3
10
, rather than -, hel i ces (Mari no et
al., 1999).
The so-cal l ed l euci ne-ri ch-vari ant repeats of a hy-
potheti cal protei n from Azotobacter vinelandii al so
assembl e as an arch, but wi th an -hel i x on i ts
i nsi de and a 3
10
hel i x on i ts outsi de (Peters et al.,
1996). Furthermore, there i s onl y sl i ght sequence
si mi l ari ty to l euci ne-ri ch repeats i n thei r patterns of
conserved hydrophobi c resi dues. Therefore, these re-
peats are unl i kel y to be homol ogues of l euci ne-ri ch
repeats.
OTHER REPEAT FAMILIES
Other protei n fami l i es are too numerous to de-
scri be here. I nstead, i n thi s secti on we shal l di scuss
fami l i es that demonstrate i mportant di fferences i n
structure, functi on, and evol uti on, when compared
to -propel l ers, -trefoi l s, and TPRs, and ankyri n,
ARM/HEATand l euci ne-ri ch repeats (see Tabl e I I I ).
Si nce these si x repeat fami l i es form regul ar non-
brous and monomeri c structures, other repeat fam-
i l i es that l ack structure, that form rod-l i ke struc-
tures or that form ol i gomers, wi l l be di scussed.
The bronecti n-bi ndi ng repeats of staphyl ococcal
protei ns are known not to form a regul ar terti ary
structure i n sol uti on (Penkett et al., 2000). These are
unusual i n that they appear to onl y adopt a regul ar
terti ary structure when bound to thei r l i gand, the
mammal i an extracel l ul ar protei n bronecti n. Thei r
unfol ded conformati ons may be l i nked to the bacte-
ri al protei ns abi l i ti es to evade both proteol yti c and
i mmune defenses of the mammal i an hosts.
Many repeats form ri gi d l i near arrays, or rods. A
great number of these are ol i gomeri c coi l ed-coi l pro-
tei ns contai ni ng between two and ve amphi pathi c
-hel i ces (Burkhard et al., 2001). These l ong hel i ces
often wi nd about one another formi ng paral l el l eft-
handed coi l ed coi l s. These structures contai n char-
acteri sti c seven-resi due (heptad) repeats and may
extend up to several tens of nanometers l ong.
By contrast to these brous protei ns of -struc-
ture, l ong l aments can be composed of repeated
-structures, such as i n the adenovi rus ber protei n
(van Raai j et al., 1999). The crystal structure of the
shaft regi on of thi s protei n shows that i t forms ho-
motri mers wi th cl ose associ ati on of three two-strand
repeati ng uni ts (the tri pl e -spi ral fol d) i n the
shaft. Consequentl y, beyond i ts constructi on from
-structure rather than from -hel i ces, i t i s si mi l ar
to three-chai n coi l ed-coi l l aments.
Other repeat fami l i es provi de addi ti onal i nsi ghts
i nto the evol uti on of repeated structures.
Fi l aments may al so be bui l t from short, few resi -
TABLE II
Exampl es of Frequentl y Occurri ng Repeat Fami l i es
Repeat Ref1 L 3D PDB Ref2 Di stri buti on Functi on Pfam
Kel ch Neer et al. (1994) 40 -Barrel 1gof I to et al. (1991) Eukaryoti c Enzyme. Protei n
processi ng
PF01344
Fi brobl ast
growth factor
Murzi n et al.
(1992)
40 -Trefoi l 2afg_A Eri ksson et al.
(1993)
Eukaryoti cvi ral Devel opment PF00167
Tetratri co-
pepti de
repeats
Zhang et al.
(1991)
34 - 1a17 Das et al. (1998) Eukari oti cbacteri al
archaeal
PPI PF00515
Ankyri n Lux et al. (1990) 33 --
-Hai rpi n
1awc_B Batchel or et al.
(1998)
Eukaryoti cbacteri al
vi ral
PPI PF00023
HEAT Andrade and
Bork (1995)
47 - 1b3u_A Groves et al.
(1999)
Eukaryoti c PPI None
Leuci ne-ri ch
repeats
Kobe and
Dei senhofer
(1994)
20 - 1dfj_I Kobe and
Dei senhofer
(1995)
Eukaryoti cbacteri al PPI PF00560
Note. Abbrevi ati ons used: Repeat, name of the repeat; Ref1, the ori gi nal descri pti on and/or characteri zati on of the repeat i n the
l i terature; L (l ength), average l ength of the repeat i n ami no aci ds; 3D, fol d category; PDB, the PDB i denti er of the structure shown i n
Ref2; Di stri buti on, phyl eti c di stri buti on of the repeat fami l y; Functi on, summary of the functi on of the fami l y (PPI , protei nprotei n
i nteracti on); Pfam, I denti er of the correspondi ng entry i n the Pfam database.
125 PROTEI N REPEATS: STRUCTURES, FUNCTI ONS, AND EVOLUTI ON
due, repeats. Spi der si l k protei ns contai n numerous
gl yci ne-ri ch repeats: GPGG(X)
n
-turn spi ral and
GGX 3
10
hel i x repeats; here X, denotes any resi due.
I nteresti ngl y onl y a subset of si l k protei n genes con-
tai n i ntrons, but these i ntrons show even greater
average sequence i denti ty among themsel ves (87%)
than do the exons (73%) (Hayashi and Lewi s, 2000).
One expl anati on for thi s i s that the codi ng regi ons
have undergone accel erated evol uti on (Hi l l and
Hasti e, 1987), due to extreme sel ecti ve pressures
ari si ng from the i mportance of these genes to the
spi ders survi val . Meanwhi l e, the conservati on of
i ntrons i s associ ated wi th rapi d i nternal dupl i cati on
of gene porti ons, due i n part to sl i ppage duri ng rep-
l i cati on. Thus, rapi d i nternal gene dupl i cati ons and
mutati on mi ght al so account, al though to l esser ex-
tents, for many other repeti ti ve protei ns, i ncl udi ng
each of those di scussed previ ousl y.
Fl occul ati on i n yeast i s medi ated, i n par t, by
occul i ns whi ch, i n Saccharomyces cerevisiae, con-
tai n at l east four occul i n r epeats. The onl y ex-
cepti on to thi s i s YHR213w, whose hypotheti cal
tr ansl ati on pr oduct contai ns a si ngl e occul i n r e-
peat. Exami nati on of the genomi c sequence of
yeast chr omosome VI I I ar ound YHR213w i ndi -
cates that the si mi l ar i ty to a nei ghbor i ng occul i n
gene (Fl o5) extends beyond both the N- and
C-ter mi nal ends of the open r eadi ng fr ame over a
number of stop codons. Thi s i s a cl ear i ndi cati on of
a pseudogene, and i t i s i denti ed as such i n a
S. cerevisiae database (the Muni ch I nfor mati on
Centr e for Pr otei n Sequences, www.mi ps.bi ochem.
mpg.de/pr oj/yeast/). Thi s i s an exampl e wher e a
possi bl e er r or i n the pr edi cted gene str uctur e may
be hi ghl i ghted when a conceptual tr ansl ati on of a
genomi c sequence pr esents an unusual domai n
ar chi tectur e (dened as the sequenti al ar r ange-
ment of domai ns, r epeats, and moti fs).
TABLE III
Other Less Frequentl y Occurri ng Repeat Fami l i es
Repeat Ref1 L 3D PDB Ref2 Di stri buti on Functi on Pfam
-Farnesyl
transferase
Park et al. (1997) 42 -Barrel 1ft2b Park et al. (1997) Eukaryoti c Enzyme. Protei n
processi ng
None
Adenovi rus ber
protei n
Green et al.
(1983)
15 Tri pl e spi ral 1qi u van Rai i j et al.
(1999)
Vi ral PPI . Bi nds to
host receptor
None
Zei n Argos et al.
(1982)
20 -Hel i x
(proposed)
Model Matsushi ma et
al. (1997)
Pl ants Pl ant seed
storage protei n
PF01559
Bacteri al
gl ycosyl
transferase
Wren (1991) 35 Unknown None Bacteri al Enzyme. Smal l
mol ecul es
bi ndi ng
None
I nsect anti freeze
protei n
Graham et al.
(1997)
12 -sheet 1ezg_A Li ou et al. (2000) Metazoa I ce bi ndi ng.
Anti freeze
None
I ce nucl eati on
protei n
Guri an-Sherman
and Li ndow
(1993)
16 Hai rpi n-l oop 1i na Tsuda et al.
(1997)
Bacteri al Catal yst of i ce
formati on
PF00818
Nebul i n Pfuhl et al.
(1996)
35 -Hel i x
(proposed)
None Metazoa PPI . Bi nds to
F-acti n
PF00880
Notch/l i n-12 Wharton et al.
(1985)
31 Unknown None Metazoa PPI . Lateral
i nhi bi ti on of
devel opment
processes
PF00066
Pl ecti n Wi che et al.
(1991)
38 Unknown None Metazoa PPI .
Cytoskel eton.
Cel l adhesi on.
Anti gens
PF00681
Spectri n Spei cher and
Marchesi
(1984)
106 Three-hel i x
bundl e
1cun Pascual et al.
(1997)
Metazoa PPI . Cel l shape.
Cytoskel eton
PF00435
Annexi n Barton et al.
(1991)
60 Fi ve-hel i x bundl e 1ai n Weng et al.
(1993)
Eukaryoti c Regul atory.
Membrane
fusi on.
Exocytosi s
PF00191
Fl occul i n Watari et al.
(1994)
45 Unknown None S. cerevisiae Regul atory of
occul ati on
PF00624
Major vaul t
protei n
Vasu et al. (1993) 52 Unknown None Eukaryoti c Mul ti drug
resi stance
PF01505
Note. The col umns are dened as i n Tabl e I I . Here the Notch repeat i s al so cal l ed l i n12.
126 ANDRADE, PEREZ-I RATXETA, AND PONTI NG
The more pervasi ve functi ons di spl ayed by repeat
ensembl es are catal ysi s and protei nprotei n recog-
ni ti on. However, a repeti ti ve structure can be used
for other di fferent tasks. The mul ti pl i ci ty of repeats
that mi mi ck water structure i s a good exampl e of the
functi onal exi bi l i ty that can be acqui red vi a protei n
repeat evol uti on. On one hand, i nsect and pl ant
protei ns protect themsel ves from freezi ng usi ng re-
peats that i mpede i ce formati on (Li ou et al., 2000;
Worral l et al., 1998). On the other hand, bacteri al
protei ns use di fferent repeat types to favor the for-
mati on of i ce as a mechani sm of weakeni ng an i n-
fected pl ant (Guri an-Sherman and Li ndow, 1993).
A more passi ve functi on i s pl ayed by the repeats of
the pl ant storage protei ns, -prol ami ns, Fi rst i den-
ti ed i n mai ze zei n protei ns (Argos et al., 1982)
these repeats are l i kel y to form a l ayer of hel i ces
packed i n an hexagonal arrangement (Matsushi ma
et al., 1997). I n thi s case, the structure of the repeat
bears l i ttl e rel ati on to i ts organi smal functi on, si nce
i t i s the unusual composi ti on of ni trogen-ri ch ami no
aci ds that i s requi red for i ts seed germi nati on prop-
erti es.
The vaul t i s a ri bonucl ear parti cl e observed i n
hi gher and l ower eukaryotes. I ts functi on remai ns
uncl ear, but i ts el evated expressi on i n cancer l i nes
seems to be rel ated to mul ti drug resi stance (Ki ck-
hoefer et al., 1998). The whol e mol ecul e i s hol l ow
and thi s suggested that drugs may be sequestered
from thei r targets i nsi de the parti cl e (Kong et al.,
1999); 78% of the total mass of the parti cl e i s com-
posed of 96 copi es of the MVP (major vaul t protei n,
Vasu et al., 1993). MVP homol ogues di spl ay seven
copi es of a 52-ami no-aci d repeat. These numbers
resembl e repeats present i n propel l ers, i n parti c-
ul ar RCC1 repeats (Renaul t et al., 1998), suggesti ng
that MVP repeats may al so form a si mi l ar cl osed
structure.
CONCLUSIONS
Our survey of protei n repeats has hi ghl i ghted the
mul ti functi onal i ty of repeat types, thei r structural
di fferences, and thei r prol i ferati ons i n di fferent evo-
l uti onary l i neages. One l i kel y reason for thei r evo-
l uti onary success i s that repeat-contai ni ng protei ns
are rel ati vel y cheap to evol ve. By thi s we mean
that l arge and thermodynami cal l y stabl e protei ns
may ari se by the si mpl e expedi ent of i ntrageni c du-
pl i cati ons, rather than the more compl ex processes
of denovo -hel i x and -sheet creati on. Thi s i s sup-
ported by the l arger si zes of most repeat-contai ni ng
structures rel ati ve to compact domai ns (Fi g. 4).
Thi s does not, of course, present a compl ete an-
swer to thei r success si nce i t addresses the questi on
of how repeat-contai ni ng protei ns arose, rather than
why they have been sel ected for and xed i n evol u-
ti onary l i neages on so many separate occasi ons. As
suggested throughout thi s revi ew, the reasons for
the functional successes of repeat cl asses may be a
procl i vi ty of repeat assembl i es to acqui re di fferent
mol ecul ar functi ons, namel y, the associ ati on wi th
di fferent protei n l i gands. Thi s, i n turn, mi ght be
associ ated wi th the l arge sol vent-accessi bl e surface
areas, presented by extended open assembl i es,
that are avai l abl e for i nteracti ons wi th l i gands. Thi s
i s because buri al of nonpol ar resi dues at protei n
protei n i nterfaces i s thought to be an i mportant
contri butor to heterodi mer stabi l i ty (Tsai et al.,
1997).
I n understandi ng the evol uti on of repeats, one
major probl em remai ns. Repeats are dened as oc-
curri ng mul ti pl y, and al l repeats i n a fami l y are
homol ogous. Thi s means that these repeats al l
evol ved from a common ancestor, whi ch necessari l y
must have contai ned onl y a si ngl e repeat. Thi s i s
apparentl y contradi ctory, si nce i t i s not expected
FIG. 4. Di stri buti on of domai n si ze i n known structures. The
bol d l i ne i ndi cates the average si ze of domai ns, of approxi matel y
100 ami no aci ds (Wheel an et al., 2000). Repeats and thei r corre-
spondi ng PDB codes are shown (from l eft to ri ght). Cl osed struc-
tures: kel ch, 1gof; gl ucose dehydrogenase-B, 1c9u; hemopexi n,
1qjs; brobl ast growth factor, 2afg; open structures: LRR/typi cal ,
1a9n; LRR/ri bose i nhi bi tor, 1a4y; LRV, 1l rv; TPR, 1al 7; ankyri n,
1awc_B; armadi l l o, 1bk5; HEAT, 1b3u_A; VHS, 1el k_A; annexi n,
1ai n; adenovi rus brous protei n, 1qi u; I AFP, 1ezg.
127 PROTEI N REPEATS: STRUCTURES, FUNCTI ONS, AND EVOLUTI ON
that a si ngl e repeat coul d exi st i n i sol ati on, as a
si ngl e fol ded functi onal uni t. Rescue i s at hand i f one
suggests that the fami l ys common ancestor i ndeed
represented a si ngl e repeat, but one that formed
homool i gomers. The homool i gomeri c structure of the
ancestor mi ght mi rror that of the i ntrachai n repet-
i ti ve structure of i ts modern homol ogue, except i n i ts
mul ti chai n character. Thi s scenari o has recentl y
been suggested for the evol uti on of the -trefoi l fol d
(Ponti ng and Russel l , 2000).
A probl em wi th thi s proposal i s that there are few,
i f any, known exampl es where homol ogous mul ti re-
peat assembl i es are formed both from ol i gomers of
si ngl e repeats and from a si ngl e chai n of mul ti pl e
repeats. However, thi s mi ght not be too surpri si ng
si nce the hi ghl y cooperati ve process of fol di ng a mul -
ti repeat protei n must be si gni cantl y more favor-
abl e than fol di ng a homool i gomeri c protei n from i ts
consti tuent monomers. Thi s i s because the ki neti c
fol di ng pathways of mul ti repeat protei n structures
may be nucl eated at many posi ti ons. I n thi s way
anci ent ol i gomeri c si ngl e repeat protei ns mi ght have
been dri ven to exti ncti on by thei r monomeri c mul ti -
pl e repeat-contai ni ng homol ogues.
REFERENCES
Abergel , C., Bouveret, E., Cl averi e, J. M., Brown, K., Ri gal , A.,
Lazdunski , C., and Benedetti , H. (1999) Structure of the Esch-
erichia coli Tol B protei n determi ned by MAD methods at 1.95 A
resol uti on, Struct. Fold Des. 7, 1291.
Al tschul , S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang,
Z., Mi l l er, W., and Li pman, D. J. (1997) Gapped BLAST and
PSI -BLAST: A new generati on of protei n database search pro-
grams, Nucleic Acids Res. 25, 33893402.
Andrade, M. A., and Bork, P. (1995) HEAT repeats i n the Hun-
ti ngtons di sease protei n, NatureGenet. 11, 115116.
Andrade, M. A., Petosa, C., ODonoghue, S. I ., Mul l er, C. W., and
Bork, P. (2001) Compari son of ARM and HEATprotei n-repeats,
J . Mol. Biol. 309, 118.
Andrade, M. A., Ponti ng, C., Gi bson, T., and Bork, P. (2000)
I denti cati on of protei n repeats and stati sti cal si gni cance of
sequence compari sons, J . Mol. Biol. 298, 521537.
Argos, P., Pedersen, K., Marks, M. D., and Larki ns, B. A. (1982)
A structural model for mai ze zei n protei ns, J . Biol. Chem. 257,
99849990.
Bai roch, A., and Apwei l er, R. (2000) The SWI SS-PROT protei n
sequence database and i ts suppl ement TrEMBL i n 2000, Nu-
cleic Acids Res. 28, 4548.
Barton, G. J., Newman, R. H., Freemont, P. S., and Crumpton,
M. J. (1991) Ami no aci d sequence anal ysi s of the annexi n
super-gene fami l y of protei ns, Eur. J . Biochem. 198, 749760.
Batchel or, A. H., Pi per, D. E., de l a Brousse, F. C., McKni ght,
S. L., and Wol berger, C. (1998) The structure of GABP/: An
ETS domai n-ankyri n repeat heterodi mer bound to DNA, Sci-
ence279, 10371041.
Bateman, A., Bi rney, E., Durbi n, R., Eddy, S. R., Howe, K. L., and
Sonnhammer, E. L. (2000) The Pfam protei n fami l i es database,
Nucleic Acids Res. 28, 263266.
Bayl i ss, R., Li ttl ewood, T., and Stewart, M. (2000) Structural
basi s for the i nteracti on between FxFG nucl eopori n repeats
and i mporti n-beta i n nucl ear trafcki ng, Cell 102, 99108.
Boguski , M. S., Murray, A. W., and Powers, S. (1992) Novel
repeti ti ve sequence moti fs i n the al pha and beta subuni ts of
prenyl -protei n transferases and homol ogy of the subuni t to the
MAD2 gene product of yeast, New Biol. 4, 408411.
Bork, P., and Gi bson, T. J. (1996) Appl yi ng moti f and prol e
searches, Methods Enzymol. 266, 162184.
Burkhard, P., Stetefel d, J., and Strel kov, S. V. (2001) Coi l ed coi l s:
A hi ghl y versati l e protei n fol di ng moti f, Trends Cell Biol. 11,
8288.
Ci ngol ani , G., Petosa C., Wei s K., and Mul l er, C. W. (1999) Struc-
ture of i mporti n- bound to the I BB domai n of i mporti n-,
Nature399, 221229.
Chook, Y. M., and Bl obel , G. (1999) Structure of the nucl ear
transport compl ex karyopheri n-beta2-Ran GppNHp, Nature
399, 230237.
Conti , E., Uy, M., Lei ghton, L., Bl obel , G., and Kuri yan, J. (1998)
Crystal l ographi c anal ysi s of the recogni ti on of a nucl ear l ocal -
i zati on si gnal by the nucl ear i mport factor karyopheri n al pha,
Cell 94, 193204.
Das, A. K., Cohen, P. W., and Barford, D. (1998) The structure of
the tetratri copepti de repeats of protei n phosphatase 5: I mpl i -
cati ons for TPR-medi ated protei n-protei n i nteracti ons, EMBO
J . 17, 11921199.
Di ederi ch, R. J., Matsuno, K., Hi ng, H., and Artavani s-Tsakonas,
S. (1994) Cytosol i c i nteracti on between del tex and Notch
ankyri n repeats i mpl i cates del tex i n the Notch si gnal i ng path-
way, Development 120, 473481.
Eddy, S. (1998) Prol e hi dden Markov model s, Bioinformatics 14,
755763.
Eri ksson, A. E., Cousens, L. S., and Matthews, B. W. (1993)
Renement of the structure of human basi c brobl ast growth
factor at 1.6 A resol uti on and anal ysi s of presumed hepari n
bi ndi ng si tes by sel enate substi tuti on, Protein Sci. 2, 1274
1284.
Ful op, V., Bocskei , Z., and Pol gar, L. (1998) Prol yl ol i gopepti dase:
An unusual -propel l er domai n regul ates proteol ysi s, Cell 94,
161170.
Gi ndhart, J. G., Jr., and Gol dstei n, L. S. (1996) Tetratri co pepti de
repeats are present i n the ki nesi n l i ght chai n, Trends Biochem
Sci. 21, 5253.
Gori na, S., and Pavl eti ch, N. P. (1996) Structure of the p53 tumor
suppressor bound to the ankyri n and SH3 domai ns of 53BP2,
Science274, 10011005.
Graham, L. A., Li ou, Y. C., Wal ker, V. K., and Davi es, P. L. (1997)
Hyperacti ve anti freeze protei n from beetl es, Nature388, 727
728.
Green, N. M., Wri gl ey, N. G., Russel l , W. C., Marti n, S. R., and
McLachl an, A. D. (1983) Evi dence for a repeati ng cross-beta
sheet structure i n the adenovi rus bre, EMBOJ . 2, 13571365.
Groves, M. R., Hanl on, N., Turowski , P., Hemmi ngs, B. A., and
Bartford, D. (1999) The structure of the protei n phosphatase 2A
PR65/A subuni t reveal s the conformati on of i ts 15 tandeml y
repeated HEAT moti fs, Cell 96, 99110.
Guri an-Sherman, D., and Li ndow, S. E. (1993) Bacteri al i ce nu-
cl eati on: Si gni cance and mol ecul ar basi s, FASEB J . 7, 1338
1343.
Hayashi , C. Y., and Lewi s, R. V. (2000) Mol ecul ar archi tecture
128 ANDRADE, PEREZ-I RATXETA, AND PONTI NG
and evol uti on of a modul ar spi der si l k protei n gene, Science
287, 14771479.
Heger, A., and Hol m, L. (2000) Rapi d automati c detecti on and
al i gnment of repeats i n protei n sequences, Proteins 41, 224
237.
Hi l debrand, A., Romari s, M., Rasmussen, L. M., Hei negard, D.,
Twardzi k, D. R., Border, W. A., and Ruosl ahti , E. (1994) I nter-
acti on of the smal l i ntersti ti al proteogl ycans bi gl ycan, decori n
and bromodul i n wi th transformi ng growth factor beta, Bio-
chem. J . 302, 527534.
Hi l l , R. E., and Hasti e, N. D. (1987) Accel erated evol uti on i n the
reacti ve centre regi ons of seri ne protease i nhi bi tors, Nature
326, 9699.
Hi rano, K., Phan, B. C., and Hartshorne, D. J. (1994) I nteracti ons
of the subuni ts of smooth muscl e myosi n phosphatase, J . Biol.
Chem. 272, 36833688.
I ozzo, R. V. (1998) Matri x proteogl ycans: From mol ecul ar desi gn
to cel l ul ar functi on, Annu. Rev. Biochem. 67, 609652.
I to, N., Phi l l i ps, S. E., Stevens, C., Ogel , Z. B., McPherson, M. J.,
Keen, J. N., Yadav, K. D., Ju, B. G., Jeong, S., Bae, E., Hyun, S.,
Carrol l , S. B., Yi m, J., and Ki m, J. (2000) Fri nge forms a
compl ex wi th Notch, Nature405, 191195.
Ki ckhoefer, V. A., Rajavel , K. S., Scheffer, G. L., Dal ton, W. S.,
Scheper, R. J., and Rome, L. H. (1998) Vaul ts are up-regul ated
i n mul ti drug-resi stant cancer cel l l i nes, J . Biol. Chem. 273,
89718974.
Kobe, B., and Dei senhofer, J. (1994) The l euci ne-ri ch repeat: A
versati l e bi ndi ng moti f, Trends Biochem. Sci. 19, 415421.
Kobe, B., and Dei senhofer, J. (1995) A structural basi s of the
i nteracti ons between l euci ne-ri ch repeats and protei n l i gands,
Nature374, 183186.
Kong, L. B., Si va, A. C., Rome, L. H., and Stewart, P. L. (1999)
Structure of the vaul t, a ubi qui tous cel l ul ar component, Struc-
ture7, 371379.
Lamb, J. R., Tugendrei ch, S., and Hi eter, P. (1995) Tetratri co
pepti de repeat i nteracti ons: To TPR or not to TPR? Trends
Biochem. Sci. 20, 257259.
Lapouge, K., Smi th, S. J. M., Wal ker, P. A., Gambl i n, S. J.,
Smerdon, S. J., and Ri tti nger, K. (2000) Structure of the TPR
domai n of p67
phox
i n compl ex wi th Rac-GTP, Mol. Cell 6, 899
907.
Li ou, Y. C., Toci l j, A., Davi es. P. L., and Ji a, Z. (2000) Mi mi cry of
i ce structure by surface hydroxyl s and water of a beta-hel i x
anti freeze protei n, Nature406, 322324.
Lo Conte, L., Ai l ey, B., Hubbard, T. J., Brenner, S. E., Murzi n,
A. G., and Chothi a, C. (2000) SCOP: A structural cl assi cati on
of protei ns database, Nucleic Acids Res. 28, 257259.
Lohi , O., and Lehto, V. P. (1998) VHS domai n marks a group of
protei ns i nvol ved i n endocytosi s and vesi cul ar trafcki ng,
FEBS Lett. 440, 255257.
Lux, S. E., John, K. M., and Bennett, V. (1990) Anal ysi s of cDNA
for human erythrocyte ankyri n i ndi cates a repeated structure
wi th homol ogy to ti ssue-di fferenti ati on and cel l -cycl e control
protei ns, Nature344, 3642.
Mal i k, H. S., Ei ckbush, T. H., and Gol dfarb, D. S. (1997) Evol u-
ti onary speci al i zati on of the nucl ear targeti ng apparatus, Proc.
Natl. Acad. Sci. USA 94, 1373813742.
Mao, Y., Ni cki tenko, A., Duan, X., Ll oyd, T. E., Wu, M. N., Bel l en,
H., and Qui ocho, F. A. (2000) Crystal structure of the VHS and
FYVE tandem domai ns of Hrs, a protei n i nvol ved i n membrane
trafcki ng and si gnal transducti on, Cell 100, 447456.
Marcotri gi ano, J., Lomaki n, I . B., Sonenberg, N., Pestova, T. V.,
Hel l en, C. U. T., and Burl ey, S. K. (2001) A Conserved HEAT
Domai n wi thi n eI F4G Di rects Assembl y of the Transl ati on
I ni ti ati on Machi nery, Mol. Cell 7, 193203.
Marcotte, E. M., Pel l egri ni , M., Yeates, T. O., and Ei senberg, D.
(1999) A census of protei n repeats, J . Mol. Biol. 293, 151160.
Mari no, M., Braun, L., Cossart, P., and Ghosh, P. (1999) Struc-
ture of the l nl B l euci ne-ri ch repeats, a domai n that tri ggers
host cel l i nvasi on by the bacteri al pathogen L. monocytogenes,
Mol. Cell 4, 10631072.
Matsushi ma, N., Danno, G., Takezawa, H., and I zumi , Y. (1997)
Three-di mensi onal structure of mai ze al pha-zei n protei ns stud-
i ed by smal l -angl e X-ray scatteri ng, Biochem. Biophys. Acta
1339, 1422.
McLachl an, A. D. (1983) Anal ysi s of gene dupl i cati on repeats i n
the myosi n rod, J . Mol. Biol. 169, 1530.
Mott, R., and Tri be, R. (1999) Approxi mate stati sti cs of gapped
al i gnments, J . Comput. Biol. 6, 91112.
Mott, R. (2000) Accurate Formul a for P-val ues of gapped l ocal
sequence and prol e al i gnments, J . Mol Biol. 300, 649659.
Murzi n, A. G. (1999) Structural pri nci pl es for the propel l er as-
sembl y of beta-sheets: The preference for seven-fol d symmetry,
Proteins 14, 191201.
Murzi n, A. G., Lesk, A. M., and Chothi a, C. (1992) -Trefoi l fol d.
Patterns of structure and sequence i n the Kuni tz i nhi bi tors
i nterl euki ns-1 beta and 1 al pha and brobl ast growth factors,
J . Mol. Biol. 223, 531543.
Neer, E. J., Schmi dt, C. J., Nambudri pad, R., and Smi th, T. F.
(1994) The anci ent regul atory-protei n fami l y of WD-repeat pro-
tei ns, Nature371, 297300.
Neer, E. J., and Smi th, T. F. (1996) G protei n heterodi mers: New
structures propel new questi ons, Cell 84, 175178.
Neuwal d, A. F., and Hi rano, T. (2000) HEAT repeats associ ated
wi th condensi ns, cohesi ns, and other compl exes i nvol ved i n
chromosome-rel ated functi ons, GenomeRes. 10, 14451452.
Oh, B., Hwang, S. Y., Sol ter, D., and Knowl es, B. B. (1997)
Spi ndl i n, a major maternal transcri pt expressed i n the mouse
duri ng the transi ti on from oocyte to embryo, Development 124,
493503.
Ohno, S. (1970) Evol uti on by Gene Dupl i cati on, Spri nger-Verl ag,
Berl i n.
Ordway, R. W., Pal l anck, L., and Ganetzky, B. (1994) A TPR
domai n i n the SNAP secretory protei ns, Trends Biochem Sci.
19, 530531.
Pascual , J., Pfuhl , M., Wal ther, D., Saraste, M., and Ni l ges, M.
(1997) Sol uti on structure of the spectri n repeat: A l eft-handed
anti paral l el tri pl e-hel i cal coi l ed-coi l , J . Mol. Biol. 273, 740
751.
Pei fer, M., Berg, S., and Reynol ds, B. (1994) A repeati ng ami no
aci d moti f shared by protei ns wi th di verse cel l ul ar rol es, Cell
76, 789791.
Pel l egri ni , M., Marcotte, E. M., and Yeates, T. O. (1999) A fast
al gori thm for genome-wi de anal ysi s of protei ns wi th repeated
structures, Proteins 35, 440446.
Penkett, C. J., Dobson, C. M., Smi th, L. J., Bri ght, J. R., Pi ckford,
A. R., Campbel l , I . D., and Potts, J. R. (2000) I denti cati on of
resi dues i nvol ved i n the i nteracti on of Staphylococcus aureus
bronecti n-bi ndi ng protei n wi th the (4)F1(5)F1 modul e pai r of
human bronecti n usi ng heteronucl ear NMR spectroscopy,
Biochemistry 39, 28872893.
Peters, J. W., Stowel l , M. H., and Rees, D. C. (1996) A l euci ne-ri ch
repeat vari ant wi th a novel repeti ti ve protei n structural moti f,
Nat. Struct. Biol. 3, 991994.
129 PROTEI N REPEATS: STRUCTURES, FUNCTI ONS, AND EVOLUTI ON
Pfuhl , M., Wi nder, S. J., Casti gl i one Morel l i , M. A., Labei t, S., and
Pastore, A. (1996) Correl ati on between conformati onal and
bi ndi ng properti es of nebul i n repeats, J . Mol. Biol. 257, 367
384.
Ponti ng, C. P. (2000) Protei ns of the endopl asmi c-reti cul um-as-
soci ated degradati on pathway: Domai n detecti on and functi on
predi cti on, BiochemJ . 351, 527535.
Ponti ng, C. P., and Pal l en, M. J. (1999a) A beta-propel l er domai n
wi thi n Tol B, Mol. Microbiol. 31, 739740.
Ponti ng, C. P., and Pal l en, M. J. (1999b) -propel l er repeats and
a PDZ domai n i n the tri corn protease: Predi cted sel f-compart-
mental i sati on and C-termi nal pol ypepti de-bi ndi ng strategi es of
substrate sel ecti on, FEMS Microbiol. Lett. 179, 447451.
Ponti ng, C. P., Aravi nd, L., Schul tz, J., Bork, P., and Kooni n, E. V.
(1999) Eukaryoti c si gnal l i ng domai n homol ogues i n archaea
and bacteri a: Anci ent ancestry and hori zontal gene transfer, J .
Mol. Biol. 289, 729745.
Ponti ng, C. P., and Bi rney, E. (2000) I denti cati on of domai ns
from protei n sequences, in Webster, D. (Ed.), Protei n Structure
Predi cti on, Humana Press, Cl i fton, NJ.
Ponti ng, C. P., and Russel l , R. B. (2000) I denti cati on of di stant
homol ogues of brobl ast growth factors suggests a common
ancestor for al l -trefoi l protei ns, J . Mol. Biol. 302, 10411047.
Preker, P. J., and Kel l er, W. (1998) The HAT hel i x, a repeti ti ve
moti f i mpl i cated i n RNA processi ng, Trends Biochem. Sci. 23,
1516.
Pri ce, S. R., Evans, P. R., and Nagai , K. (1998) Crystal structure
of the spl i ceosomal U2B-U2A protei n compl ex bound to a
fragment of U2 smal l nucl ear RNA, Nature394, 645650.
Ray, M.-C., Germon, P., Vi anney, A., Portal i er, R., and Lazzaroni ,
J. C. (2000) I denti cati on by geneti c suppressi on of Escherichia
coli Tol B resi dues i mportant for Tol B-Pal i nteracti on, J . Bac-
teriol. 182, 821824.
Renaul t, L., Nassar, N., Vetter, I ., Becker, J., Kl ebe, C., Roth, M.,
and Wi tti nghofer, A. (1998) The 1.7 A crystal structure of the
regul ator of the chromosome condensati on (RCC1) reveal s a
seven-bl aded propel l er, Nature392, 97101.
Ri ggl eman, B., Wi eschaus, E., and Schedl , P. (1989) Mol ecul ar
anal ysi s of the armadi l l o l ocus: Uni forml y di stri buted tran-
scri pts and a protei n wi th novel i nternal repeats are associ ated
wi th a Drosophi l a segment pol ari ty gene, GenesDev. 3, 96113.
Russel l , R. B., Sasi eni , P. D., and Sternberg, M. J. E. (1998)
Supersi tes wi thi n superfol ds. Bi ndi ng si te si mi l ari ty i n the
absence of homol ogy, J . Mol. Biol. 282, 903918.
Saupe, S., Turcq, B., and Begueret, J. (1995) A gene responsi bl e
for vegetati ve i ncompati bi l i ty i n the fungus Podospora anse-
rina encodes a protei n wi th a GTP-bi ndi ng moti f and G beta
homol ogous domai n, Gene162, 135139.
Scheuer, C., Bri nker, A., Bourenkov, G., Pegoraro, S., Moroder,
L., Bartuni k, H., Hartl , F. U., and Moare, I . (2000) Structure
of TPR domai n-pepti de compl exes: Cri ti cal el ements i n the
assembl y of the Hsp70-Hsp90 mul ti chaperone machi ne, Cell
101, 199210.
Sedgwi ck, S. G., and Smerdon, S. J. (1999) The ankyri n repeat: A
di versi ty of i nteracti ons on a common structural framework,
Trends Biochem. Sci. 24, 311316.
Schul er, G. D., Al tschul , S. F., and Li pman, D. J. (1991) A work-
bench for mul ti pl e al i gnment constructi on and anal ysi s, Pro-
teins 9, 180190.
Schul tz, J., Doerks, T., Ponti ng, C. P., Copl ey, R. R., and Bork, P.
(2000) More than 1000 putati ve novel human si gnal l i ng pro-
tei ns reveal ed by ESTdata mi ni ng, NatureGenet. 25, 201204.
Schul tz, J., Mi l petz, F., Bork, P., and Ponti ng, C. P. (1998)
SMART, a si mpl e modul ar archi tecture research tool : I denti -
cati on of si gnal i ng domai ns, Proc. Natl. Acad. Sci. USA 95,
58575864.
Shen, Y., Romo, G. M., Dong, J., Schade A., McI nti re, L. V.,
Kenny, D., Whi sstock, J. C., Berndt, M. C., Lopez, J. A., and
Andrews, R. K. (2000) Requi rement of l euci ne-ri ch repeats of
gl ycoprotei n (GP) I b for shear-dependent and stati c bi ndi ng of
von Wi l l ebrand factor to the pl atel et membrane GP I b-I X-V
compl ex, Blood 95, 903910.
Si korski , R. S., Boguski , M. S., Goebl , M., and Hi eter, P. (1990) A
repeati ng ami no aci d moti f i n CDC23 denes a fami l y of pro-
tei ns and a new rel ati onshi p among genes requi red for mi tosi s
and RNA synthesi s, Cell 60, 307317.
Sondek, J., Bohm, A., Lambri ght, D. G., Hamm, H. E., and Si gl er,
P. B. (1996) Crystal structure of a G
A
protei n di mer at 2.1
resol uti on, Nature379, 369374.
Sonnhammer, E. L., and Durbi n, R. (1995) A dot-matri x program
wi th dynami c threshol d control sui ted for genomi c DNA and
protei n sequence anal ysi s, Gene167, GC110.
Spei cher, D. W., and Marchesi , V. T. (1984) Erythrocyte spectri n
i s compri sed of many homol ogous tri pl e hel i cal segments, Na-
ture311, 177180.
Spri nger, T. A. (1997) Fol di ng of the N-termi nal , l i gand-bi ndi ng
regi on of i ntegri n al pha-subuni ts i nto a beta-propel l er domai n,
Proc. Natl. Acad. Sci. USA 94, 6572.
Spri nger, T. A. (1998) An extracel l ul ar beta-propel l er modul e
predi cted i n l i poprotei n and scavenger receptors, tyrosi ne ki -
nases, epi dermal growth factor precursor, and extracel l ul ar
matri x components, J . Mol. Biol. 283, 837862.
Svensson, L., Narl i d, I ., and Ol dberg, A. (2000) Fi bromodul i n and
l umi can bi nd to the same regi on on col l agen type I bri l s, FEBS
Lett. 470, 178182.
Tsai , C. J., Li n, S. L, Wol fson, H. J., and Nussi nov, R. (1997)
Studi es of protei n-protei n i nterfaces: A stati sti cal anal ysi s of
the hydrophobi c effect, Protein Sci. 6, 5364.
Tsuda, S., I to, A., and Matsushi ma, N. (1997) A hai rpi n-l oop
conformati on i n tandem repeat sequence of the i ce nucl eati on
protei n reveal ed by NMR spectroscopy, FEBS Lett. 409, 227
231.
Van Der Bi ezen, E. A., and Jones, J. D. G. (1998) Pl ant di sease-
resi stance protei ns and the gene-for-gene concept, Trends Bio-
chem. Sci. 23, 454456.
van Raai j, M. J., Mi traki , A., Lavi gne, G., and Cusack, S. (1999)
A tri pl e beta-spi ral i n the adenovi rus bre shaft reveal s a new
structural moti f for a brous protei n, Nature401, 935938.
Vasu, S. K., Kedersha, N. L., and Rome, L. H. (1993) cDNA
cl oni ng and di srupti on of the major vaul t protei n al pha gene
(mvpA) i n Di ctyostel i um di scoi deum, J . Biol. Chem. 268,
15356153560.
Wal ker, E. H., Peri si c, O., Ri ed, C., Stephens, L., and Wi l l i ams,
R. L. (1999) Structural i nsi ghts i nto phosphoi nosi ti de 3-ki nase
catal ysi s and si gnal l i ng, Nature402, 313320.
Watari , J., Takata, Y., Ogawa, M., Sahara, H., Koshi no, S., On-
nel a, M. L., Ai raksi nen, U., Jaati nen, R., Pentti l a, M., and
Keranen, S. (1994) Mol ecul ar cl oni ng and anal ysi s of the yeast
occul ati on gene FLO1, Yeast 10, 211225.
Weng, X., Luecke, H., Song, I . S., Kang, D. S., Ki m, S. H., and
Huber, R. (1993) Crystal structure of human annexi n I at 2.5 A
resol uti on, Protein Sci. 2, 448458.
Wharton, K. A., Johansen, K. M., Xu, T., and Artavani s-Tsako-
nas, S. (1985) Nucl eoti de sequence from the neurogeni c l ocus
notch i mpl i es a gene product that shares homol ogy wi th pro-
tei ns contai ni ng EGF-l i ke repeats, Cell 43, 567581.
130 ANDRADE, PEREZ-I RATXETA, AND PONTI NG
Wheel an, S., Marchl er-Bauer, A., and Bryant, S. H. (2000) Do-
mai n si ze di stri buti on can predi ct domai n boundari es, Bioin-
formatics 16, 613619.
Wi che, G., Becker, B., Luber, K., Wei tzer, G., Castanon, M. J.,
Hauptmann, R, Stratowa, C., and Stewart, M. (1991) Cl oni ng
and sequenci ng of rat pl ecti n i ndi cates a 466-kD pol ypepti de
chai n wi th a three-domai n structure based on a central al pha-
hel i cal coi l ed coi l , J . Cell. Biol. 114, 8399.
Worral l , D., El i as, L., Ashford, D., Smal l wood, M., Si debottom, C.,
Li l l ford, P., Tel ford, J., Hol t, C., and Bowl es, D. (1998) A carrot
l euci ne-ri ch-repeat protei n that i nhi bi ts i ce recrystal l i zati on,
Science282, 115117.
Wren, B. W. (1991) A fami l y of cl ostri di al and streptococcal l i -
gand-bi ndi ng protei ns wi th conserved C-termi nal repeat se-
quences, Mol. Microbiol. 5, 797803.
Zhang, K., Smouse, D., and Perri mon, N. (1991) The crooked neck
gene of Drosophila contai ns a moti f found i n a fami l y of yeast
cycl e genes, Genes Dev. 5, 10801091.
131 PROTEI N REPEATS: STRUCTURES, FUNCTI ONS, AND EVOLUTI ON

You might also like