You are on page 1of 45

Nucleic Acids Research Advance Access published April 5, 2011

Nucleic Acids Research, 2011, 18 doi:10.1093/nar/gkr203

Cryptic splice sites and split genes

Yuri Kapustin1,*, Elcie Chan2, Rupa Sarkar2, Frederick Wong2, Igor Vorechovsky3, Robert M. Winston2, Tatiana Tatusova1 and Nick J. Dibb2,*
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20814, USA, 2Institute of Reproductive and Developmental Biology, Imperial College London, Du Cane Road, London W12 ONN and 3Division of Human Genetics, University of Southampton Medical School, Southampton SO16 6YD, UK
Received October 6, 2010; Revised March 17, 2011; Accepted March 22, 2011

ABSTRACT We describe a new program called cryptic splice finder (CSF) that can reliably identify cryptic splice sites (css), so providing a useful tool to help investigate splicing mutations in genetic disease. We report that many css are not entirely dormant and are often already active at low levels in normal genes prior to their enhancement in genetic disease. We also report a fascinating correlation between the positions of css and introns, whereby css within the exons of one species frequently match the exact position of introns in equivalent genes from another species. These results strongly indicate that many introns were inserted into css during evolution and they also imply that the splicing information that lies outside some introns can be independently recognized by the splicing machinery and was in place prior to intron insertion. This indicates that non-intronic splicing information had a key role in shaping the split structure of eukaryote genes. INTRODUCTION Eukaryotic genomes contain large numbers of splice sites, known as cryptic splice sites (css), which are generally held to be disadvantageous sites that are dormant or used only at low levels unless activated by mutation of nearby authentic or advantageous splice sites (1,2). Once activated, css may be used extremely efciently, resulting in a wide range of genetic disease (35). It is generally accepted that css are suppressed by nearby stronger splice sites and that splice site selection can be viewed as a competition between the various potential splice sites in a pre-mRNA for the splicing machinery (1,2). For genes with many introns it is suspected that up to 50% of mutations that cause disease do so by affecting splicing, either through the activation of css, exon

skipping or disruption of alternative splicing (47). Css are found in exons as well as introns and their recognition by the splicing machinery is similar to splice site recognition in general and is dependent upon information both at the splice site and outside this region at enhancer and silencer sequences (810). It is important to be able to predict the positions of css that might be activated in genetic disease and a number of DNA-sequence scanning programs have been developed for this purpose. Such programs are often highly informative but are handicapped by the complex nature of the nucleotide information that is required to dene a splice site (4,8,11,12). Our previous work indicates a connection between css and introns. We identied a small number of css in the exon regions of actin genes by experiment and discovered that eight out of nine of these exonic css sites exactly match the positions of introns in actin genes from other species, which led us to conclude that these particular actin introns were inserted into css during evolution (13,14). This nding may help to explain why and how eukaryotes acquired introns; however, it is important to establish if our results for the actin gene family are generally applicable. We have been unable to identify a DNA-scanning program that reveals the same strong correlation between predicted actin css and intron positions that we observed through experiments (13). However, this is probably because DNA-scanning programs were not designed specically for this purpose and because of the difculties such programs face in distinguishing between css and false-positive non-functional splice sites (12). Here, we describe a program called cryptic splice nder (CSF) that can reliably identify css by EST-to-genomic alignment. It does this by identifying transcripts that have been generated through the low level use of css by normal genes. Unlike the scanning programs, CSF cannot predict the position of splice sites that are created de novo by gene mutation. However, this program provides a useful complementary

Downloaded from by guest on January 13, 2013

*To whom correspondence should be addressed. Tel: +212 813 8774; Fax: +212 826 8280; Email: Correspondence may also be addressed to Nick J. Dibb. Tel: +020 7594 2103; Fax: +020 759 42154; Email:
The Author(s) 2011. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

2 Nucleic Acids Research, 2011

resource for studies of genetic disease and it also enabled us to establish that there is a strong and general correlation between the positioning of css and of introns. The evolutionary implications of this nding are discussed.

MATERIALS AND METHODS CSF CSF ( predicts css based on spliced alignment of ESTs. Each EST is aligned against the genome independently from other ESTs. In order to be considered by CSF, a gap in the alignment must be anked by a minimum number (25 by default) of matching residues. CSF searches for EST alignments that form the patterns illustrated in Figure 1A. It can be seen that the majority of ESTs must share a common gap or deletion and in addition must include minor transcripts that share only one of the common deletion endpoints. CSF denes the common deletion endpoints as authentic splice sites and the deletion end point of the minor transcript(s) as cryptic or alternative splice sites (arrowed). For constitutively spliced genes, these minor deletion endpoints occur very infrequently and are therefore candidate css. For alternatively spliced genes, CSF identies both css and more frequently occurring alternative splice sites (see Results section). As illustrated in Figure 1A, css can be 50 or 30 and upstream or downstream of the alternative authentic splice site. In the majority of cases the authentic splice

sites dened by CSF are the same as the splice sites dened by reference sequences (i.e. NM_000518.4) and so represent commonly used splice sites (see below for exceptions). In more detail, coordinates of the splice sites from adjacent exonintronexon sequences are pooled into four-tuples which are then loaded into a relational database alongside the data linking them to their alignments. The database runs a query to detect tuple pairs satisfying the css condition: the overlapping introns match at one end and mismatch at the other, with a mismatching intron end from one tuple residing within the exon from the other tuple. For each tuple pair returned by the query, the splice sites of the intron that has more supporting ESTs are declared authentic and the remaining site is declared cryptic. Splice site coordinates are mapped against the NCBI36/hg18 human genome assembly. As illustrated (Figure 1A), a css detected by CSF is further classied as to whether it is a 50 or 30 css and whether it is located in an exon or intron. The number of transcripts supporting a splice (either authentic or css), is printed and the complete list of such ESTs is linked under the count. Where applicable, the count is followed by a number in parentheses. For exonic css, parentheses always appear on the left-hand (authentic) side of the CSF report and on the right-hand side (cryptic) for intronic css. The numbers in parentheses show the transcripts that formally satisfy the css conditions (see Notes about CSF section). Notes about CSF. CSF only lists authentic 50 - and 30 -splice sites when one of the two authentic splice sites is a more common alternative for another nearby splice site that is listed under the cryptic (alternative) column. This means that CSF only lists a subset of the total number of authentic splice sites for any one gene. CSF provides a very good way of screening large numbers of transcripts for the minority that are likely to have been generated by use of a css or nearby alternative splice site. However, candidate transcripts do need to be checked, by for example, using the Splign alignment option (15). CSF classies a minority of less well-supported deletions as authentic splices because these happen to meet the CSF conditions. These sites can be easily recognized from the CSF output because they have lower levels of support than other authentic splice sites and do not match reference sequence splice sites. Caution needs to be taken with the interpretation of these particular authentic splice sites and also with the interpretation of css that are paired with authentic splice sites that have a low-parenthesis score. See Supplementary Data for illustrative examples. Searching CSF. Two methods of searching for css are provided (Supplementary Figure S1). In the rst method, a landmark EST accession is submitted and CSF returns a list of css within the genomic range of the entered EST. In the second method, an arbitrary genomic interval can be specied and CSF will return a hierarchic list of css for that interval. Entire chromosomes can also be entered such as NC_000001 (human chromosome 1) for which

Downloaded from by guest on January 13, 2013

5ss 3ss

5ss 3ss

5ss 3ss

5ss 3ss

(ii) (i) 5



3 5ss





Figure 1. (A) CSF searches for transcript alignments that form one of four patterns (iiv). All of these patterns contain a group of major transcripts that share a common deletion and a minor transcript that shares only one of the deletion endpoints. CSF denes the common deletion endpoints as authentic splice sites and the less common deletion endpoint of the minor transcript(s) as cryptic or alternative splice sites (arrowed). (B) Schematic of the HBB gene for human b-globin, which contains two introns that are constitutively spliced from pre-mRNA. As illustrated the vast majority of ESTs align as shown and dene the three exons of this gene. The circle shows a pattern of ESTs that CSF is designed to recognize and that is reported in Figure 2A. The numbers in brackets show the genome coordinates of the three splice sites that are identied and listed by CSF (Figure 2A).

Nucleic Acids Research, 2011 3

CSF currently lists 3232 css. Links are provided in order to view the sequence alignments of individual css. CSF can predict css for Homo sapiens, Bos taurus, Mus musculus, Danio rerio and Arabidopsis thaliana and we intend to expand this range as further transcript data becomes available Statistical analysis. There are 91 known different intron positions within the coding region of the actin gene family, these have been identied by sequencing actin genes from over 160 different species (1618). The positions of all of these introns together with the 14 css that we have identied are plotted in Supplementary Figure S2A. Actin genes usually have 375 codons and therefore three times this number of possible positions for introns or css and so the probability of a single css exactly matching an intron position by chance is 91/(3 375). The exact Fishers test gives the P-value of 1.6 1010 for 11 or more matches out of 14 occurring by chance. Similarly, we identied 135 css in the coding region of 51 different genes of the ribosomal protein gene database. The 51 genes are 190 codons in size on average and have a total of 957 introns (from 22 species). The probability of a css exactly matching an intron by chance is therefore 957/(190 3 51) = 0.033. The binomial probability of 33 or more matches out of 135 occurring by chance is P = 2.2 1015. RTPCR. Total RNA was extracted using Trizol and the cDNA rst strand was synthesized from 0.1 mg of RNA using superscript III (Invitrogen) and random hexamers. PCR products were generated with Taq polymerase (NBI) for 25 or 35 cycles and separated on 5% native polyacrylamide gels. PCR bands were excised and cloned into pGEM-T Easy vectors (Promega) for sequencing by colony PCR followed by ABI Prism Big Dye Terminator cycle sequencing (Applied Biosystems). RESULTS Principle of css detection by CSF Css are often used highly efciently in genetic disease following the mutation of nearby more competitive splice sites. We, therefore, reasoned that css might be used at a low but detectable frequency by normal genes. Figure 1A shows the patterns of EST alignments that CSF is designed to identify. It can be seen that CSF identies groups of ESTs or transcripts that share a common deletion, when aligned to the genome, together with a minor transcript(s) that shares just one of the common deletion endpoints. CSF denes the common deletion endpoints as authentic splice sites and the unusual deletion endpoint of the minor transcript as a cryptic or alternative splice site (see Materials and Methods section for further details). Figure 1B illustrates how ESTs align to the human HBB gene for b-globin, which has two introns. The circle identies a pattern of alignments that is recognized by CSF because it includes a minor transcript that has an unusual deletion endpoint (position 5 204 752) that is a predicted css (Figures 1B and 2A). The reason why such css predictions turn out to be accurate (see below) is

because of the restriction that the predicted css is paired with a commonly used splice site. This distinguishes mRNA deletions that are generated by low-level aberrant splicing from deletions that are generated by non-splicing mechanisms such as errors during transcription or during the generation of the EST. Figure 2A shows the CSF output for HBB, which is one of the rst genes in which css were identied (19). CSF identies a single EST called BU198526 as having been generated by aberrant splicing. BU198526 aligns to HBB as illustrated in Figure 1B(ii), this alignment can be viewed in detail by a link to Splign (Figure 2). As explained above, the reason why BU198526 was identied by CSF is because it forms a pattern alignment with other ESTs as illustrated in Figure 1A(ii). The CSF output (Figure 2A) shows that BU198526 shares a deletion endpoint at position 52 044 605 that is in common with 703 other ESTs (link provided) but differs in having an unusual 50 deletion endpoint at position 5 204 752, which is the predicted css. A comparison with the known css of HBB conrms that 5 204 752 is indeed a css (Supplementary Table S1). We show below that css predictions by CSF are very reliable even if supported by just a single EST, as in this case. However, HBB has 11 known css (Supplementary Table S1), which illustrates that CSF is limited in its predictions, most probably by the amount of available transcript data. For alternatively spliced genes, CSF identies both css and also nearby alternative splice sites that act to shorten or lengthen exons. This is because the two types of splice sites are rather similar. For example, the css identied in HBB (Figure 2A) might be considered an alternative splice site if used at a greater frequency than 1 in 703. WT1 is an example of gene that encodes at least three alternatively spliced mRNAs and for which there are only 80 ESTs. CSF identies two css (Figure 2B), however, the CSF output also shows that the predicted css at 32 370 103, for example, is supported by ve ESTs including the reference sequence NM_000378.3 and that the authentic splice site nine bases away at 32 370 094 is supported by only eight transcripts. The similar usage of the authentic and css shows that these are really alternative 50 ss. However, it is useful to have this type of information because alternative splice sites are also implicated in genetic disease and in this particular case disruption of the splice site at position 32 370 094 gives rise to Frasier syndrome, possibly due to the increased use of splice site 32 370 103 (20,21). To test CSF we analyzed a database called DBASS (database of aberrant splice sites), which lists 340 human genes that have one or more css that are activated in genetic disease (3). There are 814 different css listed in DBASS and CSF predicts 609 css from the same set of 340 genes. Fifty-eight percent of these predictions are supported by only single ESTs (Supplementary Table S1). Before comparing the css identied by CSF with those of DBASS, we rst asked whether the css predictions that are supported by just single ESTs were likely to have been generated by aberrant splicing. We reasoned that because CSF identies deletion endpoints irrespective of their sequence, then if these rare deletion endpoints

Downloaded from by guest on January 13, 2013

4 Nucleic Acids Research, 2011

Downloaded from by guest on January 13, 2013

Figure 2. (A) CSF output for the human gene HBB for b-globin. (B) CSF output for WT1 (see text). The coordinates that are used refer to the NCBI36/hg18 human genome assembly. It should be noted that HBB and WT1 genes align in a 30 50 direction with respect to their genome coordinates.

were generated by splicing they should look like splice sites. The predictions were divided into 50 and 30 css and Table 1 shows that the predicted css have a very good match to the expected 50 - or 30 -splice site consensus sequence, which is a strong proof that the vast majority of even the rarest deletions identied by CSF were generated by aberrant splicing. Of the two sets of DBASS and CSF css, only 46 are in common (Supplementary Tables S1 and S2). However, there are only 61 cases where the DBASS and CSF css are located within the same exon or intron (Supplementary Table S1, column 5), giving a match rate of 46/61 or 75%. Thirteen of the 15 DBASS css that did not match were either created de novo by mutation and would, therefore, not be expected to be identied by CSF or were of opposite type, for example, a 50 css in DBASS and a 30 css in CSF and would, therefore, be unlikely to match (Supplementary Table S1). Only two out of 48 css predictions that could be meaningfully compared with DBASS did not exactly match (Supplementary Table S1). This result together with the clear consensus sequence results shown in Table 1 indicates that CSF predictions are highly reliable and can only generate a very low level of false positives even when supported by single ESTs. The small number of 46 css in common between the CSF database and DBASS (Supplementary Table S1) presumably indicates that both DBASS and CSF have identied relatively few of all possible css within this set of 340 genes. We conrmed that css predicted by CSF could also be detected by experiment. Five css predictions by CSF that DBASS shows to be activated in patients were detected in human cell lines that do not have the causative genetic mutations (Figure 3, lanes 15). The last lane shows an

example of a css prediction by CSF that has not been reported in patients. The PCR products marked with asterisks were conrmed by sequencing (Supplementary Figure S3). These results conrmed the CSF predictions and also show that the css we analyzed were already active in a normal genetic background. Css were previously reported to be active at very low levels in normal globin genes (22) and our results systematically extend this nding. It is clear from Figure 3 that some splice sites that have been classied as css are used by normal genes at a relatively high frequency (lanes 25), which is in accordance with the CSF analysis (Supplementary Table S2). These css can therefore also be regarded as alternative splice sites that are further activated in disease. Css and introns We previously identied nine css within the coding region of the actin gene family by experiment and reported an eight out of nine exact match to the position of introns in actin genes from other species (13,14). CSF identied eight css within the coding region of the actin gene family of which three were identical in position to the previously identied actin css. Of the ve new css identied by CSF, three exactly match the position of introns in other species, therefore extending our previous study and adding further support to the validity of CSF predictions (Supplementary Figure S2). Because CSF predictions are sufciently accurate, we could use this program to establish whether there is a general relationship between the position of css and introns. We therefore analyzed an extensive intron database compiled of some 80 genes from a wide range of species that encode ribosomal proteins (23). Figure 4A

Nucleic Acids Research, 2011 5

shows a small part of this database that records the positions of introns against a 14 amino acid stretch of the gene RPS5 from 23 species (le kindly provided by N. Kenmochi). Introns that fall between codons are indicated by red shading and introns that fall within codons are indicated in blue (Phase 1) and green (Phase 2). For example, there is an intron located within a codon for valine (V) in the fungus Cryptococcus neoformans but in no other listed species. This observation is typical in that although introns are wide-spread among eukaryotes, the vast majority of individual intron positions are found in only a minority of species (Supplementary Table S3) (18). We screened for css in ribosomal protein genes from human, mouse, Danio rerio and Arabidopsis thaliana and identied 74, 41, 16 and 4 css, respectively, within coding regions, these diminishing numbers simply reect the
Table 1. The alignment of css predictions by CSF reveals consensus sequences typical of splice sites Base (%) 5 4 3 2 1 1 2 3 4 5

amount of EST data that is available. Alignment of the predicted css indicates that the large majority are correct because they conform to splice site consensus sequences (Supplementary Table S4). All of these css were used at low frequencies, similar to the HBB example of Figure 2A. Figure 4B shows how we record the position of css; in this example, there is a css in the human RPS5 gene that exactly matches the position of introns in Drososphila


50 css predictions T 28.8 17.6 C 22.4 24.1 G 24.1 34.7 A 24.1 19.4 0 3 css predictions T 31.8 22.9 C 32.4 30.2 G 20.1 24.0 A 15.6 22.9

21.2 27.6 25.3 25.3 21.2 59.8 5.0 14.0

14.1 17.1 17.6 50.6 4.5 0.6 1.1 93.9

11.8 0.6 90.6 5.9 11.8 10.0 4.7 0.6 7.6 11.8 20.0 12.9 63.5 97.1 0.6 40.6 30.0 63.5 19.4 1.2 0.6 41.2 37.6 12.9 2.2 0.6 96.6 0.6 17.9 23.5 33.0 25.7 26.3 24.6 18.4 30.7 31.8 29.6 21.2 17.3 26.3 32.4 22.3 19.0 30.2 26.3 21.8 21.8



Downloaded from by guest on January 13, 2013

This Table is compiled from 169 and 179 examples of 50 and 30 css predictions, respectively, that are supported by only single ESTs (Supplementary Table S1). The relative proportions of the bases T, C, G and A are shown as a percentage at ve positions both upstream and downstream of the predicted cryptic cleavage site. The most frequently occurring bases are shaded.

Figure 4. Comparison of intron and css positions for a small part of the ribosomal protein gene database (23). (A) An alignment of 14 amino acids of the RPS5 gene that marks the position of introns for 23 species. (B) The same alignment including two css positions identied by CSF. (C) An alignment of part of RPL7A that illustrates the conservation of two css that also match an intron in Chlamydomonas reinhardtii (Cr). * - marks the position of css that match introns; ^ - marks conserved css.


161 bp

2 OAS1

219/120 bp

3 WT1

149/140 bp

4 ALG3

232/194 bp

5 SLC35A1

229/97 bp

6 SLC35A1

237 bp

Figure 3. Experimental conrmation of CSF predictions. Predicted css are shown by the vertical boxes for the indicated genes. Active css would be expected to generate PCR fragments of the sizes shown in the gene diagrams. PCR products marked with asterisks were sequenced in order to conrm the use of the predicted css (Supplementary Figure S3). Messenger RNA was prepared from the human cell lines K562 (lane 1); HEPG2 (lanes 2, 3) and primary mesenchymal stem cells (lanes 46) and used for RTPCR with the indicated primers (see Supplementary Data).

6 Nucleic Acids Research, 2011

(Dm) the honeybee (Am) and a fungus (Cc) and a nearby css in the mouse (Mm) that exactly matches the position of introns in three plant species (At, Os, Cr). We have identied 135 css to date and of these at least ve are conserved between species (Figure 4C and Supplementary Figure S4). Thirty-three out of 135 css of the ribosomal gene family exactly match the position of introns in other species (Supplementary Figure S4). This proportion is smaller than the 11/14 match observed for the actin gene family (Supplementary Figure S2), but is still highly signicant (P = 2.2 1015, see Materials and Methods section). DISCUSSION The CSF program is designed to identify transcripts that are generated through the low level use of css by normal genes. In addition, CSF also identies a subset of alternative splice sites that are similar to css, but are used at a greater frequency. Both types of splice sites are reported to be activated in genetic disease as a result of mutations that disrupt the function of nearby competitive splice sites (4,5). Our analyses show that css predictions by CSF are very reliable and so we would expect, for example, that many more of the predicted css and alternative splice sites of Supplementary Table S1 will be discovered to be associated with genetic disease. The identication of css by CSF is limited by the amount of available transcript data but this will improve as further transcript sequences become available, particularly with the advent of mRNA deep sequencing (24). CSF was designed primarily to identify css in highly conserved gene families such as actin in order to advance our understanding of intron origin. We found that about 25% of the css within the coding sequence of the large family of genes that encode ribosomal proteins exactly match the position of introns that are present in equivalent genes from other species (Figure 4 and Supplementary Figure S4). This compares to a 78% match between actin css and introns, however, there is far more phylogenetic data available for the actin gene family and so relatively more introns have been discovered. Consequently, the css that are recorded in Supplementary Figure S4 predict the positions of as yet undiscovered introns in the ribosomal gene family. The well-known and valuable early and late models of intron origin both assume that the splicing machinery evolved for the purpose of removing introns that were either present in the most ancestral genomes or were inserted after the separation of the prokaryotes (2527). At the time the models were proposed, non-intronic splicing information was not generally thought to be of major relevance and so is not an important feature of either model (28). However, it is now established that exon junction sequences are older and better conserved than most if not all introns and were sites for intron insertions during evolution (18,2933). Indeed, a number of intron properties such as their phasing with respect to the coding sequence (2930,34) and their location with respect to protein

structure have now been largely attributed to the anking exon junction sequence (35,36). Consequently, our nding that css often match the position of exon junction sequences in gene homologues, strongly indicates that css were targeted by intron insertions during evolution. Our data also indicates that the information that lies outside some introns is not only conserved with homologs that lack such introns but is also capable of being independently recognized by the splicing machinery and can dene the position of the missing introns (13). This is a striking observation because although the splicing information that anks introns contributes to their recognition, there is no mechanistic reason for this information to be autonomous rather than auxiliary in nature for the purpose of intron removal (37). It is unlikely that a css is recognized from the immediate splice-site information alone (810), suggesting that other information such as splicing enhancer sequences are also conserved between some gene homologues independently of intron presence. All of the available evidence indicates that the splicing information that anks introns was largely in place prior to their insertion (18,2933). The key question we have started to address here is whether such information might have been functional. Our data so far does not prove but it certainly supports the suggestion that information of this nature could have been used by the splicing machinery for splicing purposes prior to the arrival of introns, which is a rather different concept to the early and late models of intron origin (28). The splicing information that anks introns is perhaps similar to the splicing information that enables the mRNA of intronless genes to interact with components of the splicing machinery for the purpose of mRNA biogenesis, nonsense mediated decay or alternative splicing (3843). For genes that have introns there are still many reports of intron-independent splicing between exonic splice sites (4453). If intron-independent splicing is an ancestral mechanism then it may be far more prevalent across species than is currently realized.

Downloaded from by guest on January 13, 2013

SUPPLEMENTARY DATA Supplementary Data are available at NAR Online.

ACKNOWLEDGEMENTS We are grateful to Naoya Kenmochi for his help and would also like to thank Anil Chandrashekran, Andy Newman, Peter Rogan, Alison Russell, Malcolm Parker Terrie Sadusky and Arlin Stoltzfus.

FUNDING Reserch grants from Biotechnology and Biosciences Research Council and Atazoa. Funding for open access charge: Genesis Trust. Conict of interest statement. None declared.

Nucleic Acids Research, 2011 7

1. Padgett,R.A., Grabowski,P.J., Konarska,M.M., Seiler,S. and Sharp,P.A. (1986) Splicing of messenger RNA precursors. Annu. Rev. Biochem., 55, 11191150. 2. Green,M.R. (1986) Pre-mRNA splicing. Annu. Rev. Genet., 20, 671708. 3. Buratti,E., Chivers,M., Hwang,G. and Vorechovsky,I. (2011) DBASS3 and DBASS5: databases of aberrant 30 - and 50 -splice sites. Nucleic Acids Res., 39, D86D91. 4. Buratti,E., Chivers,M., Kralovicova,J., Romano,M., Baralle,M., Krainer,A.R. and Vorechovsky,I. (2007) Aberrant 50 splice sites in human disease genes: mutation pattern, nucleotide structure and comparison of computational tools that predict their utilization. Nucleic Acids Res., 35, 42504263. 5. Wang,G.S. and Cooper,T.A. (2007) Splicing in disease: disruption of the splicing code and the decoding machinery. Nat. Rev. Genet., 8, 749761. 6. Hastings,M.L., Resta,N., Traum,D., Stella,A., Guanti,G. and Krainer,A.R. (2005) An LKB1 AT-AC intron mutation causes Peutz-Jeghers syndrome via splicing at noncanonical cryptic splice sites. Nat. Struct. Mol. Biol., 12, 5459. 7. Lopez-Bigas,N., Audit,B., Ouzounis,C., Parra,G. and Guigo,R. (2005) Are splicing mutations the most frequent cause of hereditary disease? FEBS Lett., 579, 19001903. 8. Wimmer,K., Roca,X., Beiglbock,H., Callens,T., Etzler,J., Rao,A.R., Krainer,A.R., Fonatsch,C. and Messiaen,L. (2007) Extensive in silico analysis of NF1 splicing defects uncovers determinants for splicing outcome upon 50 splice-site disruption. Hum. Mutat., 28, 599612. 9. Kralovicova,J. and Vorechovsky,I. (2007) Global control of aberrant splice-site activation by auxiliary splicing sequences: evidence for a gradient in exon and intron denition. Nucleic Acids Res., 35, 63996413. 10. Russo,A., Siciliano,G., Catillo,M., Giangrande,C., Amoresano,A., Pucci,P., Pietropaolo,C. and Russo,G. (2010) hnRNP H1 and intronic G runs in the splicing control of the human rpL3 gene. Biochim. Biophys. Acta., 1799, 419428. 11. Divina,P., Kvitkovicova,A., Buratti,E. and Vorechovsky,I. (2009) Ab initio prediction of mutation-induced cryptic splice-site activation and exon skipping. Eur. J. Hum. Genet., 17, 759765. 12. Betz,B., Theiss,S., Aktas,M., Konermann,C., Goecke,T.O., Moslein,G., Schaal,H. and Royer-Pokora,B. (2009) Comparative in silico analyses and experimental validation of novel splice site and missense mutations in the genes MLH1 and MSH2. J. Cancer Res. Clin. Oncol., 136, 123134. 13. Sadusky,T., Newman,A.J. and Dibb,N.J. (2004) Exon junction sequences as cryptic splice sites: implications for intron origin. Curr. Biol., 14, 505509. 14. Stoltzfus,A. (2004) Molecular evolution: introns fall into place. Curr. Biol., 14, R351352. 15. Kapustin,Y., Souvorov,A., Tatusova,T. and Lipman,D. (2008) Splign: algorithms for computing spliced alignments with identication of paralogs. Biol. Direct., 3, 20. 16. Bhattacharya,D. and Weber,K. (1997) The actin gene of the glaucocystophyte Cyanophora paradoxa: analysis of the coding region and introns, and an actin phylogeny of eukaryotes. Curr. Genet., 31, 439446. 17. Flakowski,J., Bolivar,I., Fahrni,J. and Pawlowski,J. (2006) Tempo and mode of spliceosomal intron evolution in actin of foraminifera. J. Mol. Evol., 63, 3041. 18. Qiu,W.G., Schisler,N. and Stoltzfus,A. (2004) The evolutionary gain of spliceosomal introns: sequence and phase preferences. Mol. Biol. Evol., 21, 12521263. 19. Treisman,R., Orkin,S.H. and Maniatis,T. (1983) Specic transcription and RNA splicing defects in ve cloned beta-thalassaemia genes. Nature, 302, 591596. 20. Barbaux,S., Niaudet,P., Gubler,M.C., Grunfeld,J.P., Jaubert,F., Kuttenn,F., Fekete,C.N., Souleyreau-Therville,N., Thibaud,E., Fellous,M. et al. (1997) Donor splice-site mutations in WT1 are responsible for Frasier syndrome. Nat. Genet., 17, 467470. 21. Niaudet,P. and Gubler,M.C. (2006) WT1 and glomerular diseases. Pediatr. Nephrol., 21, 16531660.

22. Haj Khelil,A., Deguillien,M., Moriniere,M., Ben Chibani,J. and Baklouti,F. (2008) Cryptic splicing sites are differentially utilized in vivo. FEBS J., 275, 11501162. 23. Yoshihama,M., Nguyen,H.D. and Kenmochi,N. (2007) Intron dynamics in ribosomal protein genes. PLoS ONE, 2, e141. 24. Pickrell,J.K., Pai,A.A., Gilad,Y. and Pritchard,J.K. (2010) Noisy splicing drives mRNA isoform diversity in human cells. PLoS Genet., 6, e1001236. 25. Cavalier-Smith,T. (1985) Selsh DNA and the origin of introns. Nature, 315, 283284. 26. Gilbert,W. (1978) Why genes in pieces? Nature, 271, 501. 27. Roy,S.W. and Gilbert,W. (2006) The evolution of spliceosomal introns: patterns, puzzles and progress. Nat. Rev. Genet., 7, 211221. 28. Dibb,N.J. (1993) Why do genes have introns? FEBS Lett., 325, 135139. 29. Nguyen,H.D., Yoshihama,M. and Kenmochi,N. (2006) Phase distribution of spliceosomal introns: implications for intron origin. BMC Evol. Biol., 6, 69. 30. Ruvinsky,A., Eskesen,S.T., Eskesen,F.N. and Hurst,L.D. (2005) Can codon usage bias explain intron phase distributions and exon symmetry? J. Mol. Evol., 60, 99104. 31. Bhattacharya,D., Simon,D., Huang,J., Cannone,J.J. and Gutell,R.R. (2003) The exon context and distribution of Euascomycetes rRNA spliceosomal introns. BMC Evol. Biol., 3, 7. 32. Dibb,N.J. and Newman,A.J. (1989) Evidence that introns arose at proto-splice sites. EMBO J., 8, 20152021. 33. Sverdlov,A.V., Rogozin,I.B., Babenko,V.N. and Koonin,E.V. (2004) Reconstruction of ancestral protosplice sites. Curr. Biol., 14, 15051508. 34. Long,M., de Souza,S.J., Rosenberg,C. and Gilbert,W. (1998) Relationship between proto-splice sites and intron phases: evidence from dicodon analysis. Proc. Natl Acad. Sci. USA, 95, 219223. 35. De Kee,D.W., Gopalan,V. and Stoltzfus,A. (2007) A sequence-based model accounts largely for the relationship of intron positions to protein structural features. Mol. Biol. Evol., 24, 21582168. 36. Whamond,G.S. and Thornton,J.M. (2006) An analysis of intron positions in relation to nucleotides, amino acids, and protein secondary structure. J. Mol. Biol., 359, 238247. 37. Burge,C.B., Tuschl,T. and Sharp,P.A. (eds), (1999) Splicing of Precursors to mRNAs by the Spliceosomes, 2nd edn. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. 38. Brogna,S. and Wen,J. (2009) Nonsense-mediated mRNA decay (NMD) mechanisms. Nat. Struct. Mol. Biol., 16, 107113. 39. Guang,S., Felthauser,A.M. and Mertz,J.E. (2005) Binding of hnRNP L to the pre-mRNA processing enhancer of the herpes simplex virus thymidine kinase gene enhances both polyadenylation and nucleocytoplasmic export of intronless mRNAs. Mol. Cell. Biol., 25, 63036313. 40. Guang,S. and Mertz,J.E. (2005) Pre-mRNA processing enhancer (PPE) elements from intronless genes play additional roles in mRNA biogenesis than do ones from intron-containing genes. Nucleic Acids Res., 33, 22152226. 41. Juneau,K., Nislow,C. and Davis,R.W. (2009) Alternative splicing of PTC7 in Saccharomyces cerevisiae determines protein localization. Genetics, 183, 185194. 42. Pozzoli,U., Riva,L., Menozzi,G., Cagliani,R., Comi,G.P., Bresolin,N., Giorda,R. and Sironi,M. (2004) Over-representation of exonic splicing enhancers in human intronless genes suggests multiple functions in mRNA processing. Biochem. Biophys. Res. Commun., 322, 470476. 43. Brody,Y., Neufeld,N., Bieberstein,N., Causse,S.Z., Bohnlein,E.M., Neugebauer,K.M., Darzacq,X. and Shav-Tal,Y. (2011) The In Vivo Kinetics of RNA Polymerase II Elongation during Co-Transcriptional Splicing. PLoS Biol., 9, e1000573. 44. Ng,B., Yang,F., Huston,D.P., Yan,Y., Yang,Y., Xiong,Z., Peterson,L.E., Wang,H. and Yang,X.F. (2004) Increased noncanonical splicing of autoantigen transcripts provides the structural basis for expression of untolerized epitopes. J. Allergy Clin. Immunol., 114, 14631470.

Downloaded from by guest on January 13, 2013

8 Nucleic Acids Research, 2011

45. Rovescalli,A.C., Cinquanta,M., Ferrante,J., Kozak,C.A. and Nirenberg,M. (2000) The mouse Nkx-1.2 homeobox gene: alternative RNA splicing at canonical and noncanonical splice sites. Proc. Natl Acad. Sci. USA, 97, 19821987. 46. Baumbusch,L.O., Myhre,S., Langerod,A., Bergamaschi,A., Geisler,S.B., Lonning,P.E., Deppert,W., Dornreiter,I. and Borresen-Dale,A.L. (2006) Expression of full-length p53 and its isoform Deltap53 in breast carcinomas in relation to mutation status and clinical parameters. Mol. Cancer, 5, 47. 47. Cagliani,R., Bardoni,A., Sironi,M., Fortunato,F., Prelle,A., Felisari,G., Bonaglia,M.C., DAngelo,M.G., Moggio,M., Bresolin,N. et al. (2003) Two dystrophin proteins and transcripts in a mild dystrophinopathic patient. Neuromuscul. Disord., 13, 1316. 48. Chikaev,N.A., Bykova,E.A., Najakshin,A.M., Mechetina,L.V., Volkova,O.Y., Peklo,M.M., Shevelev,A.Y., Vlasik,T.N., Roesch,A., Vogt,T. et al. (2005) Cloning and characterization of the human FCRL2 gene. Genomics, 85, 264272.

49. Cocquet,J., Chong,A., Zhang,G. and Veitia,R.A. (2006) Reverse transcriptase template switching and false alternative transcripts. Genomics, 88, 127131. 50. Cox,P.R., Siddique,T. and Zoghbi,H.Y. (2001) Genomic organization of Tropomodulins 2 and 4 and unusual intergenic and intraexonic splicing of YL-1 and Tropomodulin 4. BMC Genomics, 2, 7. 51. Galante,P.A., Sakabe,N.J., Kirschbaum-Slager,N. and de Souza,S.J. (2004) Detection and evaluation of intron retention events in the human transcriptome. RNA, 10, 757765. 52. Lukas,J., Gao,D.Q., Keshmeshian,M., Wen,W.H., Tsao-Wei,D., Rosenberg,S. and Press,M.F. (2001) Alternative and aberrant messenger RNA splicing of the mdm2 oncogene in invasive breast cancer. Cancer Res., 61, 32123219. 53. Aebi,M., Hornig,H., Padgett,R.A., Reiser,J. and Weissmann,C. (1986) Sequence requirements for splicing of higher eukaryotic nuclear pre-mRNA. Cell, 47, 555565.

Downloaded from by guest on January 13, 2013



DNA replication: Building the perfect switch

John F.X. Diffley

A sophisticated molecular switch ensures that replication origins are activated just once in each cell cycle. Recent work reveals how the proteolysis of a key replication inhibitor, geminin, by the anaphase promoting complex/cyclosome is an important component of this switch.
Address: ICRF Clare Hall Laboratories, South Mimms, EN6 3LD, UK. E-mail: Current Biology 2001, 11:R367R370 0960-9822/01/$ see front matter 2001 Elsevier Science Ltd. All rights reserved.

first and remains bound to origins during most or all of the cell cycle. Cdc6 then enters the complex, and cooperates with ORC to load six different but related polypeptides known as the Mcm2-7 complex [3]. Recent work in fission yeast and Xenopus has identified another pre-RC component, Cdt1, which enters the pre-RC independently of Cdc6 and is also required to load the Mcm2-7 complex [4,5]. The Mcm2-7 complex then plays a crucial role during initiation and during the ensuing elongation phase of DNA replication, perhaps acting as a replicative DNA helicase [6]. Pre-RCs can only assemble at origins during a short period of the cell cycle between the end of mitosis and a point late in G1 phase (Figure 1). This temporal separation of pre-RC assembly and origin activation is a key feature of the switch because it ensures that new pre-RCs cannot assemble on origins which have fired and, thus, origins can fire just once in each cell cycle [7]. Understanding how licensing is prevented after S phase begins, therefore, has been the focus of much research in the field. Cyclin-dependent kinases (Cdks) are central to this regulation. Cdks are essential for triggering the initiation of DNA replication from origins that contain pre-RCs. At the same time, Cdks appear to play a direct role in preventing the assembly of new pre-RCs [7]. Because Cdk activity remains high from the onset of S phase until the end of the following mitosis, re-licensing of origins cannot occur until the beginning of the next cell cycle. Although the picture is far from complete, it appears that Cdks prevent pre-RC assembly in multiple, redundant ways. In budding yeast, for example, Cdks target Cdc6 for SCF-dependent, ubiquitin-mediated degradation [810] and trigger the export of the Mcm2-7 complex from the nucleus [11,12]. It is likely that Cdks also act in other ways to prevent re-replication. Recent work in Xenopus [1,2,13,14] has revealed that another key cell cycle regulator, the anaphase promoting complex/cyclosome (APC/C), plays an important role in constraining licensing to the Cdk cycle. The APC/C is an E3 ubiquitin ligase whose activity is regulated by Cdks: it is activated in mitosis by Cdks associated with mitotic cyclins and inactivated in late G1 phase by Cdks associated with G1 cyclins [13]. In a screen for novel APC/C substrates, McGarry and Kirschner identified a protein they called geminin [14]. Consistent with it being an APC/C substrate, geminin is degraded in mitosis and degradation requires a cyclin-like destruction box near its amino-terminus. By using a destruction box mutant that cannot be degraded, these authors showed that

The sequencing of the human genome has been hailed as one of humankinds great achievements, in part, because of the sheer magnitude of the endeavour. The accurate sequence of the three billion or so nucleotides of the human genome has involved many scientists and has taken years to assemble. Yet, every proliferating human cell is faced with the prospect of having to copy accurately and precisely this same information in the space of only a few hours during the cell cycle. Either incomplete replication or over-replication would cause cell death, or worse, would generate the kinds of genetic alterations associated with diseases like cancer. To accomplish this feat in the allotted time, eukaryotic cells have developed a divide and conquer strategy. Unlike their prokaryotic counterparts, eukaryotic genomes are replicated from multiple replication origins distributed along their chromosomes. In human somatic cells, replication occurs from 10,000100,000 such replication origins; thus, each replication origin is only responsible for the replication of a relatively small portion of the genome. This strategy can allow rapid replication of large genomes but brings with it a serious bookkeeping problem. How can the cell keep track of all of these origins, ensuring that each one fires efficiently during S phase while also ensuring that no origin fires more than once? To cope with this, eukaryotic cells have evolved a remarkable molecular switch which, when turned on, promotes just a single initiation event from each origin. Two recent studies [1,2] of DNA replication in Xenopus show in greater detail the workings of this switch. At its heart is the tightly regulated assembly of prereplicative complexes (pre-RCs) at replication origins in a reaction known as licensing (Figure 1). Pre-RCs assemble in a stepwise manner: the origin recognition complex (ORC), a sequence-specific DNA binding protein, binds


Current Biology Vol 11 No 9

Figure 1 Regulation of licensing in eukaryotic cells. At the end of mitosis, the anaphase promoting complex/cyclosome (APC/C) is activated by Cdks. Active APC/C then contributes to the inactivation of Cdks by targeting the essential cyclin subunits for ubiquitin mediated degradation. In this state CDKs off, APC/C on cells are competent to assemble preRCs at their origins. However, the presence of active APC/C prevents accumulation of necessary S phase promoting factors like S phase cyclins and the Cdc7 regulatory subunit, Dbf4. At the end of G1 phase, a switch is thrown which converts cells to a very different state in which origin firing is promoted while licensing is prevented Cdks on, APC/C off. Both the activation of origin Cdks OFF APC/C ON Licensing allowed Origin firing prevented Cdc6 Cdt1 ORC Pre-RC G1 Phase ORC S, G2 Phase MCM2-7 Cdks ON APC/C OFF Licensing prevented Origin firing allowed ORC

Current Biology

firing and the prevention of licensing requires the activation of Cdks, which in turn requires

Cdk-dependent inactivation of the APC/C. Further details are provided in the text.

geminin could inhibit DNA replication and that this inhibition of replication correlated with an inhibition of Mcm loading. Two groups have recently identified Cdt1 as the target of licensing inhibition by geminin [1,2]. Wolfschlegel et al. [1] found human Cdt1 as a protein that is tightly associated with geminin in co-immunoprecipitation experiments from human cell extracts. Using cell-free replication extracts from Xenopus eggs, they showed that the inhibition of DNA replication by geminin could be overcome by addition of excess Cdt1 suggesting that geminin may act by inhibiting Cdt1. Biochemical approaches from Blow and colleagues [15] had identified two protein fractions required for licensing, termed RLF-B and RLF-M. RLF-M was previously shown to comprise a heterohexameric complex of the Mcm2-7 proteins [15]. In their recent work, Tada et al. [2] show that geminin acts by inhibiting RLF-B. They used a gemininaffinity chromatography to purify RLF-B and showed that RLF-B appears to be identical to Cdt1. Elution of Cdt1 from the geminin affinity column required 4M urea, attesting to the tight interaction between these two proteins. Although the regulated appearance and disappearance of geminin could be sufficient to explain how replication occurs only once per cell cycle, it is likely that there will be more to the story. McGarry and Kirschner [14] showed that geminin-depleted Xenopus extracts, while supporting efficient DNA replication, did not re-replicate their DNA. This demonstrates that there must be something in these extracts besides geminin which can block re-initiation. A clue to the nature of this inhibitor may come from the experiments of Tada et al. [2] who showed that the partial licensing activity in geminin-depleted metaphase extracts

could be significantly enhanced by treatment of extracts with chemical inhibitors of Cdks. This suggests that, even in the absence of geminin, Cdks are able to inhibit licensing to some extent. This is consistent with previous work showing that Cdk2 can prevent licensing in Xenopus egg extracts [16]. How does this inhibition work? Experiments in human cells and Xenopus [1719] have shown that phosphorylation of Cdc6, presumably by cyclin A-associated kinase causes its export from nuclei. This may be important for blocking licensing during S and G2 phases, however, it cannot explain the results of Tada et al. [2] since licensing in metaphase extracts occurs in the absence of a nuclear envelope. Therefore, there must be some additional and as yet unidentified way in which Cdks can prevent licensing. Thus, in Xenopus, as in yeast, Cdks block licensing in multiple, redundant ways. First, they inactivate the APC/C during G1, allowing the accumulation of the licensing inhibitor, geminin. Second, they cause the export of Cdc6 from the nucleus, and third, they act on at least one additional, unidentified target. This may be important in making absolutely sure that re-initiation never occurs and illustrates that such redundancy may be a general feature of the eukaryotic cell cycle. This points to another emerging trend: Cdks prevent rereplication by different means in different organisms (Figure 2). In budding yeast and fission yeast, Cdc6 Cdc18 in fission yeast protein levels are regulated; Cdc6 transcription is limited to late mitosis/early G1 phase and the Cdc6 protein is targeted for ubiquitin-mediated degradation when Cdks are activated in late G1 phase [810,2023]. In Xenopus, Cdc6 remains stable during the cell cycle but, instead, Cdk phosphorylation triggers its export from the nucleus [19]. As in Xenopus, the nuclear localisation of Cdc6 is also regulated by Cdks in human



Figure 2 Different ways of licensing inhibition in different organisms. Details are provided in the text.

Budding yeast SCF dependent Cdc6 proteolysis ? Cdk

Fission yeast SCF dependent Cdc6 (Cdc18) proteolysis


MCM2-7 nuclear export

Cdt1 proteolysis?

Xenopus, humans
APC/C inactivation ? Cdk Geminin accumulation, Cdt1 inhibition

Cdc6 nuclear export

Current Biology

[17,18] and, in addition, Cdc6 is targeted for degradation by the APC/C [24,25]. Cdt1 also appears to be regulated differently in different organisms. Both transcription and proteolysis of Cdt1 are very similar to that of Cdc6 in fission yeast [5]. However, in Xenopus, Cdt1 is regulated by geminin which, in turn is targeted for degradation by the APC/C [1,2,12]. In human cells, Cdt1 appears to be regulated by both geminin and, perhaps, cell cycle-regulated proteolysis [1,14]. If a gemininlike licensing inhibitor exists in budding yeast, it is not regulated solely by the APC/C since Cdk inactivation bypasses any requirement for the APC/C in licensing [26]. In a different way, the APC/C may have some role in preventing re-replication in budding yeast, for example, by targeting S phase promoting factors like Clb5 and the Cdc7 regulatory subunit Dbf4 for degradation [2730]. In conclusion, a single round of DNA replication per cell cycle is achieved by an intricate molecular mechanism which ensures that the loading and firing of replication origins cannot occur in a cell at the same time. The coupling of licensing to the Cdk cycle may be a universal feature of this switch. How this is ultimately achieved may differ in different organisms. Finally, this short review has only addressed the negative side of the switch how licensing is prevented after S phase begins. The other side of the switch the mechanism by which Cdks activate replication origins is currently far less well understood and likely to be an area of intense interest in the future.

1. Wohlschlegel JA, Dwyer BT, Dhar SK, Cvetic C, Walter JC, Dutta A: Inhibition of eukaryotic DNA replication by geminin binding to Cdt1. Science 2000, 290:2309-2312. Tada S, Li A, Maiorano D, Mechali M, Blow JJ: Repression of origin assembly in metaphase depends on inhibition of RLF-B/Cdt1 by geminin. Nat Cell Biol 2001, 3:107-113. Takisawa H, Mimura S, Kubota Y: Eukaryotic DNA replication: from pre-replication complex to initiation complex. Curr Opin Cell Biol 2000, 12:690-696. Maiorano D, Moreau J, Mechali M: XCDT1 is required for the assembly of pre-replicative complexes in Xenopus laevis. Nature 2000, 404:622-625. Nishitani H, Lygerou Z, Nishimoto T, Nurse P: The Cdt1 protein is required to license DNA for replication in fission yeast. Nature 2000, 404:625-628. Labib K, Diffley JFX: Is the MCM2-7 complex the eukaryotic DNA replication fork helicase? Curr Opin Genet Dev 2001, 11: 64-70. Diffley JFX: Once and only once upon a time: Specifying and regulating origins of DNA replication in eukaryotic cells. Genes Dev 1996, 10:2819-2830. Elsasser S, Chi Y, Yang P, Campbell JL: Phosphorylation controls timing of Cdc6p destruction: a biochemical analysis. Mol Biol Cell 1999, 10:3263-3277. Drury LS, Perkins G, Diffley JFX: The Cyclin dependent kinase Cdc28p regulates distinct modes of Cdc6p proteolysis during the budding yeast Cell cycle. Curr Biol 2000, 10:231-240. Calzada A, Sanchez M, Sanchez E, Bueno A: The stability of the Cdc6 protein is regulated by cyclin-dependent kinase/cyclin B complexes in Saccharomyces cerevisiae. J Biol Chem 2000, 275:9734-9741. Labib K, Diffley JFX, Kearsey SE: G1-phase and B-type cyclins exclude the DNA-replication factor Mcm4 from the nucleus. Nat Cell Biol 1999, 1:415-422. Nguyen VQ, Co C, Irie K, Li JJ: Clb/Cdc28 kinases promote nuclear export of the replication initiator proteins Mcm2-7. Curr Biol 2000, 10:195-205. Zachariae W, Nasmyth K: Whose end is destruction: cell division and the anaphase-promoting complex. Genes Dev 1999, 13:2039-2058. McGarry TJ, Kirschner MW: Geminin, an inhibitor of DNA replication, is degraded during mitosis. Cell 1998, 93:1043-1053.





6. 7.









Current Biology Vol 11 No 9

15. Chong, JPJ, Mahbubani HM, Khoo C-Y, Blow JJ: Purification of an MCM-containing complex as a component of the DNA replication licensing system. Nature 1995, 375:418-421. 16. Hua XH, Yan H, Newport J: A role for Cdk2 kinase in negatively regulating DNA replication during S phase of the cell cycle. J Cell Biol 1997, 137:183-192. 17. Peterson BO, Lukas J, Sorenson CS, Bartek J, Helin K: Phosphorylation of mammalian CDC6 by Cyclin A/CDK2 regulates its subcellular localization. EMBO J 1999, 18:396-410. 18. Saha P, Chen J, Thome KC, Lawlis SJ, Hou ZH, Hendricks M, Parvin JD, Dutta A: Human CDC6/Cdc18 associates with Orc1 and cyclin-cdk and is selectively eliminated from the nucleus at the onset of S phase. Mol Cell Biol 1998, 18:2758-2767. 19. Pelizon C, Madine MA, Romanowski P, Laskey RA: Unphosphorylatable mutants of Cdc6 disrupt its nuclear export but still support DNA replication once per cell cycle. Genes Dev 2000, 14:2526-2533. 20. Jallepalli PV, Tien D, Kelly TJ: sud1(+) targets cyclin-dependent kinase-phosphorylated Cdc18 and Rum1 proteins for degradation and stops unwanted diploidization in fission yeast. Proc Natl Acad Sci USA 1998, 95:8159-8164. 21. Baum B, Nishitani H, Yanow S, Nurse P: Cdc18 transcription and proteolysis couple S phase to passage through mitosis. EMBO J 1998, 17:5689-5698. 22. Wolf DA, McKeon F, Jackson PK: F-box/WD-repeat proteins pop1p and Sud1p/Pop2p form complexes that bind and direct the proteolysis of cdc18p. Curr Biol 1999, 9:373-376. 23. Kominami K, Toda T: Fission yeast WD-repeat protein pop1 regulates genome ploidy through ubiquitin-proteasome-mediated degradation of the CDK inhibitor Rum1 and the S-phase initiator Cdc18. Genes Dev 1997, 11:1548-1560. 24. Petersen BO, Wagener C, Marinoni F, Kramer ER, Melixetian M, Denchi EL, Gieffers C, Matteucci C, Peters JM, Helin K: Cell cycleand cell growth-regulated proteolysis of mammalian CDC6 is dependent on APC-CDH1. Genes Dev 2000, 14:2330-2343. 25. Mendez J, Stillman B: Chromatin association of human origin recognition complex, cdc6, and minichromosome maintenance proteins during the cell cycle: assembly of prereplication complexes in late mitosis. Mol Cell Biol 2000, 20:8602-8612. 26. Noton EA, Diffley JFX: CDK inactivation is the only essential function of the APC/C and the mitotic exit network proteins for origin resetting during mitosis. Mol Cell 2000, 5:85-95. 27. Shirayama M, Toth A, Galova M, Nasmyth K: APC(Cdc20) promotes exit from mitosis by destroying the anaphase inhibitor Pds1 and cyclin Clb5. Nature 1999, 402:203-207. 28. Cheng L, Collyer T, Hardy CF: Cell cycle regulation of DNA replication initiator factor Dbf4p. Mol Cell Biol 1999, 19:4270-4278. 29. Oshiro G, Owens JC, Shellman Y, Sclafani RA, Li JJ: Cell cycle control of Cdc7p kinase activity through regulation of Dbf4p stability. Mol Cell Biol 1999, 19:4888-4896. 30. Godinho Ferreira M, Santocanale C, Drury LS, Diffley JFX: Dbf4p, an essential S phase promoting factor, is targeted for degradation by the anaphase promoting complex. Mol Cell Biol 2000, 20:242-248.

The role of endogenous and exogenous DNA damage and mutagenesis

Errol C Friedberg, Lisa D McDaniel and Roger A Schultz
The eld of DNA damage responsiveness in general, and the consequences of endogenous and exogenous base damage in DNA, in particular, has made new and exciting contributions to our increasing understanding of the initiation and progression of neoplasia in humans. This article presents some of the highlights in this area of investigation, with a particular emphasis on DNA repair, the tolerance of DNA damage and its contribution to mutagenesis, and DNA damage checkpoint regulation.
Addresses Laboratory of Molecular Pathology, Department of Pathology, University of Texas Southwestern Medical Center, Dallas TX, 75390-9072, USA e-mail:

genes are now appropriately designated as tumor suppressor or gatekeeper genes [1]. Cells have evolved multiple, and often apparently redundant, biological responses to DNA damage that are conveniently classied as either DNA repair or DNA damage tolerance [2] (Figure 1). As the name implies, DNA repair embraces mechanisms for the enzymecatalyzed reversal of damage, the excision of base damage (including inappropriate bases such as uracil), as well as nucleotides that are incorrectly incorporated during DNA replication (Figure 1). In addition to base damage, our understanding of DNA repair now embraces the restoration of both single- and double-strand breaks in the genome [3,4]. The tolerance of DNA damage involves several distinct cellular responses, by which the potentially lethal effects of arrested DNA replication by damaged bases are mitigated (Figure 1). Both the efciency and kinetics of DNA repair and DNA damage tolerance are inuenced by regulatory responses. With the advent of microarray technologies, we are just beginning to appreciate the magnitude, variation and signicance of the extensive transcriptional responses to various types of genomic insult (Figure 1). Additionally, the role of DNA damage in the activation of cell cycle checkpoints is now a burgeoning eld, involving multiple complex pathways that transduce signals from sites of DNA damage and altered DNA replication to repair and damage tolerance effector pathways (Figure 1). In this review, we highlight some recent advances in selected aspects of the plethora of biological responses to DNA damage, with a particular emphasis on mechanisms of mutagenesis from endogenous and exogenous damage.

Current Opinion in Genetics & Development 2004, 14:510 This review comes from a themed issue on Oncogenes and cell proliferation Edited by Zena Werb and Gerard Evan 0959-437X/$ see front matter 2003 Elsevier Ltd. All rights reserved. DOI 10.1016/j.gde.2003.11.001

Abbreviations AAF acetylaminouorene APOBEC3G apolipoprotein B mRNA editing enzyme catalytic polypeptide-like 3G ATM ataxia telangiectasia mutated ATR ataxia telangiectasia and Rad3-related ATRIP ATR-interacting protein CSA Cockayne syndrome group A gene CSN COP9 signalosome FA Fanconi anemia MAC mutagenesis in aging colonies MMC mitomycin C MMS methylmethane sulfonate NER nucleotide excision repair NHEJ non-homologous end joining ORF open reading frame RNAi RNA interference RPA replication protein A XP xeroderma pigmentosum

New insights into DNA repair by the reversal of base damage

Several forms of base damage to DNA are repaired by biochemical reactions that directly reverse the damage, restoring affected bases to their native chemistry and conformation. Among the several genes that are required for the maintenance of genomic integrity in the face of alkylation damage to DNA in Escherichia coli is one called alkB, the specic function of which was unknown until very recently. In a model example of the utility of bioinformatics, Eugene Koonin and his colleagues [5] have identied a domain in the translated alkB sequence that is suggestive of an a-ketoglutarate- and Fe(II)dependent dioxygenase. Two groups [6,7] have
Current Opinion in Genetics & Development 2004, 14:510

The role of both endogenous (spontaneous) and exogenous (environmental) DNA damage in the initiation and progression of neoplasms is unassailable. Genes that participate in various mechanisms that protect cells from the generation of mutations in somatic cells except in selected physiological situations, such as the generation of mutations to promote variability in immunoglobulin

6 Oncogenes and cell proliferation

Figure 1


APOBEC3G Novel gene discovery DNA damage

Check point RPA ssDNA

Signal transduction

alkB damage reversal DNA damage repair Global NER activation COP signalosome Ubiquitin ligase Nucleotide excision repair Base excision repair Mismatch repair Single-strand break repair hMLH1 hMLH1+hMSH6 Transcription-coupled NER inactivation Cadmium inactivation

Transcriptional activation

DNA damage tolerance

Cell cycle arrest

DNA damage bypass polymerases


Double-strand break repair Artemis Lig4

Current Opinion in Genetics & Development

Cellular responses to DNA damage. The response to DNA damage (yellow box) results in either tolerance or damage repair (blue boxes), with damage repair represented by a variety of specific damage repair pathways. These events are modulated and facilitated by cell cycle checkpoint mechanisms that arrest cell cycle progression at various points (orange box). The recent insights into the complexities of these events, as summarized in this review, are shown here to depict their newly discovered role(s) in these processes (green boxes).

now independently shown that puried AlkB protein from E. coli repairs the cytotoxic lesions 1-methyladenine and 3-methylcytosine, by reversing such damage via a deoxygenase reaction that requires oxygen, a-ketoglutarate and Fe(II). The bacterial alkB gene is conserved in eukaryotes, including human cells in which there are two structural homologs, hABH2 and hABH3 where it, presumably, subserves the same function. More recent studies [8] have demonstrated that the E. coli alkB and human hABH3 gene products also repair these lesions in RNA, extending the range of the repair of biological macromolecules to this class of polynucleotides. The repair of proteins [9] and of deoxyribonucleoside triphosphate precursors of DNA [10] has been documented previously.

regions of the genome, by somewhat distinct mechanisms [11]. These distinctions primarily center around the recognition of base damage. In particular, it is believed that damage recognition in transcriptionally active DNA is effected by arrest of the transcriptional machinery. One of the gene products required for transcription-coupled NER is the CSA protein, the product of the Cockayne syndrome group A gene (CSA) [11]. A recent study [12] has identied human CSA protein in a multiprotein complex that includes the COP9 signalosome (CSN), a regulator of cullin-based ubiquitin ligases. This study also identied a second multiprotein complex, containing the COP signalosome, except that instead of CSA protein, this complex contains another protein that is required for NER, DDB2, a 48 kDa protein, which, together with DDB1 protein, comprises a heterodimer that is involved in DNA damage recognition during transcriptionally independent NER [12]. Mutations in the DDB2 gene have been identied in several individuals with the NER-defective and skin-cancer-prone hereditary disease xeroderma pigmentosum (XP), belonging to genetic complementation group E [13,14]. The authors suggest that, following exposure to UV radiation, the DDB2 complex binds to chromatin and the COP signalosome dissociates and activates ubiquitin ligase E3 activity. By contrast, when the transcriptional

New complexities in nucleotide excision repair

Nucleotide excision repair (NER) is a major form of repair for DNA base damage that results in distortions of DNA structure that (among other effects) interfere with normal base pairing. As such, it is suited to many forms of exogenous base damage, such as cyclobutane pyrimidine dimers induced by exposure to sunlight, a potent evolutionary driving force. NER encompasses the repair of both transcriptionally silent and transcriptionally active
Current Opinion in Genetics & Development 2004, 14:510

The role of endogenous and exogenous DNA damage and mutagenesis Friedberg, McDaniel and Schultz 7

machinery is arrested by UV radiation induced base damage, the CSA complex recruits the COP signalosome and inactivates the ubiquitin ligase [12].

machinery [20]. Complex formation is followed by phosphorylation of Artemis by DNA-PKCS,, an event that converts it to an endonuclease that can open DNA hairpins that are generated during V(D)J recombination [20]. Defects in NHEJ in the immune system accelerate the formation of lymphomas in mice. However, this process of strand-break rejoining can also suppress tumors in cells that do not undergo V(D)J recombination. A recent study has demonstrated that haploinsufciency for the Lig4 gene (which encodes DNA ligase IV) results in the development of non-lymphomatous tumors in a cancerprone mouse strain [21]. Hence, even a modest reduction in NHEJ activity promotes tumorigenesis in mice.

Something new in mismatch repair

One of the central debates about neoplastic transformation concerns the strong mutator phenotype of neoplasms, which cannot be accounted for by the multiplicative sum of the spontaneous mutation frequency in individual genes. An in-depth coverage of this topic is outside the scope of this review; however, among the possible explanations for this strong mutator phenotype is the notion that genes involved in mismatch repair of DNA become inactivated by one or another genetic and/or epigenetic mechanisms [15,16], and that the ensuing mutator phenotype increases the probability (by random or possibly non-random mechanisms) of inactivation of other DNA repair genes. In support of this notion, a recent study examined clones of cells with either an inactive or active hMSH6 gene [17] in which expression of hMLH1 was silenced by promoter hypermethylation. The additional inactivation of this gene in cells mutant for hMSH6 resulted in a higher mutation rate and a different mutational spectrum than cells that are wildtype for hMSH6 [17]. Most mutations that are associated with defective mismatch repair are the result of inactivation of genes for this DNA repair process. A recent study, however, has demonstrated that some exogenous mutagens can inactivate mismatch repair proteins directly [18]. Specically, chronic exposure of yeast to cadmium inactivates mismatch repair in vivo and this effect can be reproduced in vitro. This is the rst clear demonstration of an exogenous agent promoting genomic instability through direct effects on guardians of the genome, rather than on the genome itself, and begs more extensive examination of this mechanism of environmentally induced genomic instability.

DNA damage tolerance: the role(s) of error-prone DNA polymerases

Another area of signicant progress has emerged from the discovery of a large repertoire of DNA polymerases (especially in mammalian cells), endowed with the ability to bypass many types of spontaneous and exogenously generated forms of base damage, often (but not always) leading to mutations [22]. In E. coli, one of these polymerases, called Pol IV and encoded by the dinB gene, has been implicated in spontaneous mutagenesis [23]. Spontaneous mutagenesis can occur in rapidly growing and in stationary phase E. coli by different processes. There has been conicting data as to the requirement for Pol IV for the latter process. A recent study indicates that this confusion apparently originates from the type of mutant strain used. The dinB gene is part of a four-gene operon with three downstream genes of unknown function. Hence, some mutations in the dinB gene can result in polar effects. Non-polar mutants do not result in spontaneous mutations in rapidly growing cells [23]. It is believed that when replication is blocked by DNA damage in a strand (either leading or lagging), polymerase switching transpires, enabling the bypass polymerase(s) to transiently occupy the primer template for replicative bypass and then to reoccupy this site when translesion synthesis is completed. If the replicative machinery is physically displaced from the primer-template during this process, replication of both DNA strands might be expected to arrest. An in vivo system was established in E. coli to address this question [24]. It was observed that an acetylaminouorene (AAF) lesion in the leading strand did not affect the kinetics of lagging strand replication and vice versa. Hence, whatever the nature of the polymerase switching events during replicative bypass of base damage (translesion DNA synthesis), the replicative machinery does not appear to be completely disengaged from the arrested fork. Tuberculosis is a pulmonary infection that is sometimes prone to lethal antibiotic resistance. A recent study has demonstrated that when Mycobacterium tuberculosis is
Current Opinion in Genetics & Development 2004, 14:510

Whats new in strand-break repair?

Our understanding of the details of DNA-strand-break repair, particularly that in mammalian cells, has, until recently, lagged behind that of the repair of base damage: however, the past 4 5 years have witnessed impressive progress in this area. We now know that double-strand breaks can be repaired by either homologous recombination by a variety of different mechanisms, or by the direct fusion of broken ends (non-homologous end joining [NHEJ]), an area that has signicant overlap with V(D)J recombination in the immune system. A newly discovered gene called Artemis has been shown to be involved in V(D)J recombination in the immune system [19]. More recently, it was demonstrated that the Artemis protein is a single-strand-specic exonuclease. It also complexes with the DNA-dependent protein kinase (DNA-PKCS), an integral component of the NHEJ

8 Oncogenes and cell proliferation

exposed to various types of DNA base damage, a gene called dnaE2 (believed to encode a novel DNA polymerase) is upregulated and results in an increased mutation frequency in the bacterium. It is suggested that mutations associated with spontaneous DNA damage might form the basis of the antibiotic resistance that is manifested by this organism [25].

Other aspects of spontaneous mutagenesis

It is well established that, in laboratory-derived strains of E. coli, mutagenesis can be promoted by stress conditions. However, the general evolutionary signicance of this phenomenon has been questioned because laboratory strains are not representative of strains in the wild, growing in different natural ecological niches. A recent study collected nearly 800 E. coli isolates from around the world and examined mutagenesis in aging colonies (MAC) by exposure to starvation after a period of exponential growth [26]. Most natural isolates exhibited increased MAC. Although the nature of the mutagenesis was characteristic for each strain the particular ecological niche from which the strain was isolated being a major determinant of the mutator phenotype the study supports the notion that adaptive mutagenesis associated with stress-induced mutations is a general evolutionary strategy in E. coli. The high spontaneous mutation of HIV is a principal scourge of the disease AIDS, but the primary mechanism(s) of this mutagenesis remains to be established. No less than three recent studies [27 29] have demonstrated that APOBEC3G (apolipoprotein B mRNA editing enzyme catalytic polypeptide-like 3G), an endogenous inhibitor of HIV-1 replication, is a cytidine deaminase that generates G!A mutations in the viral DNA, presumably by converting cytidine to uracil in the viral DNA minus strand, thereby promoting the formation of U:A base pairs. Whereas this hypermutation is considered to have evolved as a viral defense mechanism leading to inactivation of the virus, the accumulation of APOBEC3G-induced nonlethal mutations could potentially promote variation in primate lentiviral populations, including HIV.

types of damage, or whether all are processed to a common intermediate. A recent study has demonstrated that RPA (replication protein A; a single-stranded DNA binding protein) stimulates the binding of the ATR ATRIP (ATR-interacting protein) complex to single-stranded DNA, stimulating the phosphorylation of Rad17 protein that is bound to DNA [32]. These studies suggest that single-stranded DNA coated with RPA is the key denominator of varying types of DNA damage that recruits ATR ATRIP and initiates the DNA-damagesignaling cascade. It has also been proposed that the process of retroviral DNA integration is sensed as DNA damage by host cells. In an independent study, it was demonstrated that the ATR kinase (but not the ATM kinase) is required for the successful integration of retroviral RNA [33].

New genes for biological responses to DNA damage?

The genomics era has facilitated the identication of new genes that are involved in biological responses to DNA damage. In addition to expression-array studies, other techniques have been employed in attempts to gain such insights. One recent study identied six novel genes that are involved in biological responses to UV radiation or methylmethane sulfonate (MMS) by screening a collection of >2800 yeast-deletion mutants. In each mutant, a single ORF was replaced by a cassette, containing two unique sequence tags, thus allowing for their identication by hybridization to a high-density oligonucleotide array [34]. Another study used RNAi to uncover genes in the nematode worm Caenorhabditis elegans. A total of 61 genes were found to affect genomic stability in somatic cells and spontaneous mutagenesis in the germ line [35]. Many of the genes uncovered are novel ORFs with no known function.

Fanconi anemia and BRCA

Fanconi anemia (FA) is a clinically heterogeneous human disorder that is associated with genomic instability, cancer predisposition and cellular sensitivity to certain DNA damaging agents, including mitomycin C (MMC). The disease is genetically complex, represented by eight genetic complementation groups. Genes representing six of the FA complementation groups had been previously cloned and elucidation of the function of the products of these genes over the last decade has provided insight into a previously unrecognized regulatory pathway for DNA-damage response. Five of these (groups FA-A, -C, -E, -F and -G) were shown to participate in the formation of a complex that directs the monoubiquitination of the product of the gene mutated in a sixth group (FA-D2). Moreover, it was recognized that this monoubiquitination leads to the targeting of the FANCD2

Checkpoint control and initiating signals for responses to DNA damage

ATR (ataxia telangiectasia and Rad3-related) and ATM (ataxia telangiectasia mutated) are kinases that play central roles in responses to various types of DNA damage, notably that produced by ionizing radiation [30,31]. ATR is known to phosphorylate substrates such as Brca1, Chk1 protein, p53 and Rad17. These phosphorylated substrates, in turn, mediate inhibition of DNA replication and progression through the cell cycle and promote DNA repair and other effector responses [30,31]. Many DNAdamaging agents can elicit the ATR-mediated DNA damage response, therefore an issue of considerable interest is whether different sensors function for different
Current Opinion in Genetics & Development 2004, 14:510

The role of endogenous and exogenous DNA damage and mutagenesis Friedberg, McDaniel and Schultz 9

protein to BRCA1 nuclear foci and subsequent signaling through BRCA2 and RAD51 for the repair of DNA damage. RAD51 is a crucial component of homologous recombination and this, therefore, suggests that the FA complex mediates error-free homologous recombination during S-phase arrest. These results are consistent with studies that have reported an increased mutation frequency in FA cells, relative to normal controls and, more specically, an increase in the occurrence of deletion mutations. What remained a mystery here was which genes were mutated in the other FA groups (FA-B and -D1) and how the protein products of these two unidentied genes participated in the same DNA-damagesignaling pathway. A recent report by Howlett et al. [36] demonstrates biallelic mutations in the BRCA2 gene in both of these complementation groups. The ndings illustrate that all eight of the FA gene products are involved in a single pathway that mediates DNA repair in response to DNA damage caused by agents such as MMC and diepoxybutane, which are both known to produce crosslinks in DNA. The MMC sensitivity of FA-D1 broblast was complemented, following expression of the wild-type BRCA2 protein. The fact that two distinct complementation groups are dened by mutations in a single gene suggests interallelic complementation or dominant activity for certain mutant alleles.

Meanwhile, our understanding of the multiple ways in which cells are subjected to genomic insult and the diverse responses to such damage continues to expand. Such responses appear, primarily, to be designed to avoid cell death, not mutations. Indeed, in some cases, mutations are imposed on cells as a strategy to avoid death.

We apologize to the many authors of outstanding papers that were not included here due to space limitations and the general scope of this issue.

References and recommended reading

Papers of particular interest, published within the annual period of review, have been highlighted as:  of special interest  of outstanding interest 1. 2. Levine AJ: p53, the cellular gatekeeper for growth and division. Cell 1997, 88:323-331. Sutton MD, Smith BT, Godoy VG, Walker GC: The SOS response: recent insights into umuDC-dependent mutagenesis and DNA damage tolerance. Annu Rev Genet 2000, 34:479-497. Lisby M, Mortensen UH, Rothstein R: Colocalization of multiple DNA double-strand breaks at a single Rad52 repair centre. Nat Cell Biol 2003, 5:572-577. Taylor RM, Thistlethwaite A, Caldecott KW: Central role for the XRCC1 BRCT I domain in mammalian DNA single-strand break repair. Mol Cell Biol 2002, 22:2556-2563. Aravind L, Koonin EV: The DNA-repair protein AlkB, EGL-9, and leprecan dene new families of 2-oxoglutarate- and iron-dependent dioxygenases. Genome Biol 2001, 2:research0007.1-0007.8.




The multiple mechanisms by which both natural (spontaneously-derived) and exogenous (environmentallyderived) mutations arise in cells and, hence, can trigger neoplasia are becoming clearer. Mutagenesis is a universal fact of life: the genetic diversity it generates in germ cells is essential for Darwinian evolution. From an evolutionary point of view, there is no intrinsic reason to suppress mutations in somatic cells, provided that they do not result in deleterious phenotypes before reproductive age. However, diseases such as XP, in which a hereditary defect in a specic DNA repair modality predisposes the individual to lethal multiple skin cancers well before the onset of puberty, provide dramatic evidence that preventing an excessive mutational burden in somatic cells is as essential to life as the generation of a threshold of mutations in the germ line. Biological evolution has apparently achieved a delicate balance in which mutations are tolerated at certain levels in both germ line and somatic cells. The preponderance of the evidence indicates that mutations are stochastic events in genes. Hence, should they transpire in genes that are crucial for normal cellular proliferation, the price might be neoplasia. As there is no a priori reason to select against neoplasia in organisms past their reproductive life, there is no reason to expect selection for anti-mutagenic mechanisms that specically favor oncogenes and tumor suppressor genes. Nor is there evidence of such mechanisms.


Trewick SC, Henshaw TF, Hausinger RP, Lindahl T, Sedgwick B: Oxidative demethylation by Escherichia coli AlkB directly reverts DNA base damage. Nature 2002, 419:174-178. See annotation [7].


Falnes PO, Johansen RF, Seeberg E: AlkB-mediated oxidative demethylation reverses DNA damage in Escherichia coli. Nature 2002, 419:178-182. These two papers [6,7] were published together and are the rst direct demonstration of the repair of 1-methyladenine and 3-methylcytosine by alkB protein by direct reversal of the lesion. 8. Aas PA, Otterlei M, Falnes PO, Vagbo CB, Skorpen F, Akbari M, Sundheim O, Bjoras M, Slupphaug G, Seeberg E et al.: Human and bacterial oxidative demethylases repair alkylation damage in both RNA and DNA. Nature 2003, 421:859-863. Schubert HL, Blumenthal RM, Cheng XD: Many paths to methyltransfer: a chronicle of convergence. Trends Biochem Sci 2003, 28:329-335.


10. Hayakawa H, Taketomi A, Sakumi K, Kuwano M, Sekiguchi M: Generation and elimination of 8-oxo-7,8-dihydro-2(deoxyguanosine 5(-triphosphate, a mutagenic substrate for DNA synthesis, in human cells. Biochemistry 1995, 34:89-95. 11. Friedberg EC, Walker GC, Siede W: DNA Repair and Mutagenesis. Washington DC: ASM Press; 1995. 12. Groisman R, Polanowska J, Kuraoka I, Sawada J, Saijo M,  Drapkin R, Kisselev AF, Tanaka K, Nakatani Y: The ubiquitin ligase activity in the DDB2 and CSA complexes is differentially regulated by the COP9 signalosome in response to DNA damage. Cell 2003, 113:357-367. This study yields new and provocative information concerning the possible role of the COP9 signalosome in nucleotide excision repair. The COP9 signalosome (CSN), a regulator of cullin-based ubiquitin ligases is part of two different complexes one containing DDB and another containing CSA. Differential regulation of the CSN ubiquitin ligase activity of the DDB2 and CSA complexes in response to UV irradiation is demonstrated. Current Opinion in Genetics & Development 2004, 14:510

10 Oncogenes and cell proliferation

13. Dualan R, Brody T, Keeney S, Nichols AF, Admon A, Linn S: Chromosomal localization and cDNA cloning of the genes (DDB1 and DDB2) for the p127 and p48 subunits of a human damage-specic DNA binding protein. Genomics 1995, 29:62-69. 14. Rapic-Otrin V, Navazza V, Nardo T, Botta E, McLenigan M, Bisi DC, Levine AS, Stefanini M: True XP group E patients have a defective UV-damaged DNA binding protein complex and mutations in DDB2 which reveal the functional domains of its p48 product. Hum Mol Genet 2003, 12:1507-1522. 15. Shibata D, Aaltonen LA: Genetic predisposition and somatic diversication in tumor development and progression. Adv Cancer Res 2001, 80:83-114. 16. Kolodner R: Biochemistry and genetics of eukaryotic mismatch repair. Genes Dev 1996, 10:1433-1442. 17. Baranovskaya S, Soto JL, Perucho M, Malkhosyan SR: Functional signicance of concomitant inactivation of hMLH1 and hMSH6 in tumor cells of the microsatellite mutator phenotype. Proc Natl Acad Sci USA 2001, 98:15107-15112. 18. Jin YH, Clark AB, Slebos RJ, Al-Refai H, Taylor JA, Kunkel TA,  Resnick MA, Gordenin DA: Cadmium is a mutagen that acts by inhibiting mismatch repair. Nat Genet 2003, 34:326-329. A fascinating study on the apparent direct effect of cadmium on mismatch repair. The results illustrate that exogenous agent can directly interfere with DNA-repair pathways and enhance genomic instability. 19. Moshous D, Callebaut I, de Chasseval R, Corneo B, Cavazzana-Calvo M, Le Deist F, Tezcan I, Sanal O, Bertrand Y, Philippe N et al.: Artemis, a novel DNA double-strand break repair/V(D)J recombination protein, is mutated in human severe combined immune deciency. Cell 2001, 105:177-186. 20. Ma Y, Pannicke U, Schwarz K, Lieber MR: Hairpin opening and  overhang processing by an Artemis/DNA-dependent protein kinase complex in nonhomologous end joining and V(D)J recombination. Cell 2002, 108:781-794. This study shows that Artemis protein is a single-stranded DNA-specic exonuclease and is phosphorylated by DNA-PKcs. Artemis protein is shown to possess single-strand-specic 50 !30 exo-nuclease activity. After phosphorylation by DNA-PKcs, Artemis gains endonucleolytic activity on 50 and 30 overhangs, as well as hairpins. In addition, the complex can open hairpins generated by the RAG complex. 21. Sharpless NE, Ferguson DO, OHagan RC, Castrillon DH, Lee C, Farazi PA, Alson S, Fleming J, Morton CC, Frank K et al.: Impaired nonhomologous end-joining provokes soft tissue sarcomas harboring chromosomal translocations, amplications, and deletions. Mol Cell 2001, 8:1187-1196. 22. Friedberg EC, Wagner R, Radman M: Specialized DNA polymerases, cellular survival, and the genesis of mutations. Science 2002, 296:1627-1630. 23. McKenzie GJ, Magner DB, Lee PL, Rosenberg SM: The dinB  operon and spontaneous mutation in Escherichia coli. J Bacteriol 2003, 185:3972-3977. A study showing that the dinB gene is part of a polycistronic operon. Some mutations in dinB affect down-stream genes, whereas others do not. These results account for conicting data regarding the role of Pol IV in spontaneous mutagenesis in E. coli. 24. Pages V, Fuchs RP: Uncoupling of leading- and lagging-strand  DNA replication during lesion bypass in vivo. Science 2003, 300:1300-1303. This study examines the kinetics of leading and lagging strand DNA synthesis in vivo, indicating that the synthesis of the two strands is uncoupled when a blocking lesion is on either strand. The replicative machinery does not appear to disengage from the arrested fork while the bypass polymerase performs translesion DNA synthesis. 25. Boshoff HI, Reed MB, Barry CE III, Mizrahi V: DnaE2 polymerase  contributes to in vivo survival and the emergence of drug resistance in Mycobacterium tuberculosis. Cell 2003, 113:183-193.

An interesting study, suggesting a possible mechanism of antibiotic resistance during infection with M. tuberculosis. This appears to result from the upregulation of dnaE2, a gene with homology to the major replicative polymerase, in response to DNA damage. 26. Bjedov I, Tenaillon O, Gerard B, Souza V, Denamur E, Radman M,  Taddei F, Matic I: Stress-induced mutagenesis in bacteria. Science 2003, 300:1404-1409. This study supports the notion that adaptive mutagenesis associated with stress-induced mutations is a general evolutionary strategy in E. coli. 27. Harris RS, Bishop KN, Sheehy AM, Craig HM, Petersen-Mahrt SK,  Watt IN, Neuberger MS, Malim MH: DNA deamination mediates innate immunity to retroviral infection. Cell 2003, 113:803-809. See annotation [29]. 28. Zhang H, Yang B, Pomerantz RJ, Zhang C, Arunachalam SC, Gao L:  The cytidine deaminase CEM15 induces hypermutation in newly synthesized HIV-1 DNA. Nature 2003, 424:94-98. See annotation [29]. 29. Mangeat B, Turelli P, Caron G, Friedli M, Perrin L, Trono D:  Broad antiretroviral defence by human APOBEC3G through lethal editing of nascent reverse transcripts. Nature 2003, 424:99-103. These three studies [27 29] suggest a role of the deamination of cytosine in the generation of mutations in lentiviruses. The gene product of APOBEC3G is a cytidine deaminase that generates G!A mutations in viral DNA. This gene might have evolved as a viral defense, however, it may contribute to viral diversity. 30. Zhou BB, Elledge SJ: The DNA damage response: putting checkpoints inperspective. Nature 2000, 408:433-439. 31. Abraham RT: Cell cycle checkpoint signaling through the ATM and ATR kinases. Genes Dev 2001, 15:2177-2196. 32. Zou L, Elledge SJ: Sensing DNA damage through ATRIP  recognition of RPA-ssDNA complexes. Science 2003, 300:1542-1548. This study proposes that RPA, bound to single-stranded DNA, initiates a signal transduction cascade that effects checkpoint regulation by the ATR ATRIP recruitment to the site by RPA. 33. Daniel R, Kao G, Taganov K, Greger JG, Favorova O, Merkel G,  Yen TJ, Katz RA, Skalka AM: Evidence that the retroviral DNA integration process triggers an ATR-dependent DNA damage response. Proc Natl Acad Sci USA 2003, 100:4778-4783. Retroviral integration can, apparently, activate the checkpoint response. ATR kinase, but not ATM kinase, is required for completion of retroviral integration into the genome. 34. Hanway D, Chin JK, Xia G, Oshiro G, Winzeler EA, Romesberg FE:  Previously uncharacterized genes in the UV- and MMS-induced DNA damage response in yeast. Proc Natl Acad Sci USA 2002, 99:10605-10610. This study identied novel genes involved in biological responses to DNA damage in yeast. Screening of 2,827 yeast gene deletion strains identied six novel genes that were not previously known to be involved in DNA damage. Deletion of these genes results in UV and MMS sensitivity. 35. Pothof J, van Haaften G, Thijssen K, Kamath RS, Fraser AG,  Ahringer J, Plasterk RH, Tijsterman M: Identication of genes that protect the C. elegans genome against mutations by genome-wide RNAi. Genes Dev 2003, 17:443-448. This study identied multiple genes in C. elegans that are required for genomic stability, using an RNAi strategy. 61 genes were identied, many of which were novel. 36. Howlett NG, Taniguchi T, Olson S, Cox B, Waissz Q,  De Die-Smulders C, Persky N, Grompe M, Joenje H, Pals G et al.: Biallelic inactivation of BRCA2 in Fanconi anemia. Science 2002, 297:606-609. Most of the genes responsible for the eight FA complementation groups have been identied and belong to the same protein complex that direct the monoubiquitination of FA-D2. The resulting protein interacts with BRCA1, with subsequent signaling through BRCA2 and RAD51. This study identied mutations in the BRCA2 gene in FANCD1 and FANCB patients.

Current Opinion in Genetics & Development 2004, 14:510

Mechanisms of Recombination

19. Mechanisms of Recombination

Key Concepts

Recombination occurs at regions of homology between chromosomes through the breakage and reunion of DNA molecules. Models for recombination, such as the Holliday model, involve the creation of a heteroduplex branch, or cross bridge, that can migrate and the subsequent splicing of the intermediate structure to yield different types of recombinant DNA molecules. Recombination models can be applied to explain genetic crosses. Many of the enzymes participating in recombination in bacteria have been identified.

Throughout our analysis of linkage, we studied the recombination of genes by crossing-over. In this chapter, we consider molecular mechanisms for generating recombination by crossing-over. Figure 19-1 depicts a basic crossover event, in which two homologous molecules are aligned and subsequently undergo recombination. When Benzer's work and that of others revealed that recombination occurs within genes, it became evident that recombination had to be very precise, because even single-base-pair errors could disrupt the integrity of the gene. How can recognition of homologous chromosomes and recombination events be so precise? The answer lies in the power of base-pair complementarity. We shall see how base-pair complementarity and the formation of heteroduplex regions between complementary regions of homologous chromosomes lead to the recombination events that we have been studying.

Mechanisms of Recombination

Figure 19-1. The molecular event of recombination may be schematically represented by two double-stranded molecules breaking and rejoining.

Breakage and reunion of DNA molecules

The experiments discussed in Chapter 5 provide good indirect evidence in favor of breakage and reunion. One of the first direct proofs that chromosomes (in this case, viral chromosomes) can break and rejoin came from experiments on phage done in 1961 by Matthew Meselson and Jean Weigle. They simultaneously infected E. coli with two strains of . One strain, which had the genetic markers c and mi at one end of the chromosome, was heavy because the phages were produced from cells grown in heavy isotopes of carbon (13C) and nitrogen (15N). The other strain was c+ 14mi+ for the markers and had light DNA because it was harvested from cells grown on the normal light isotopes 12C and 14N. The two DNAs (chromosomes) can be represented as shown in Figure 19-2a. The multiply infected cells were then incubated in a light medium until they lysed. The progeny phages released from the cells were spun in a cesium chloride density gradient. A wide band was obtained, indicating that the viral DNAs ranged in density from the heavy parental value to the light parental value, with a great many intermediate densities (Figure 19-2b). Interestingly, some recombinant phages were recovered with density values very close to the heavy parental value. They were of genotype c 14 mi+, and they must have arisen through an exchange event between the two markers (Figure 19-2c). The heavy density of the chromosome would be expected because only the small tip of the chromosome carrying the mi+ allele would come from the light parental chromosome. In the reciprocal cross of heavy c+ mi+ phages to light c mi, the heavy recombinants were found to be c+ mi, as expected. These results can be explained in only one way: the recombination event must have occurred through the physical breakage and reunion of DNA. Although we have to be

Mechanisms of Recombination careful about extrapolating from viral to eukaryotic chromosomes, this evidence shows that the breakage and reunion of DNA strands does occur.

Figure 19-2. Evidence for chromosome breakage and reunion in phages. (a) The chromosomes of the two strains used to multiply infect E. coli. (b) Bands produced when progeny phages are spun in a cesium chloride density gradient. The fact that intermediate densities are obtained indicates a range of chromosome compositions with partly light and partly heavy components. (c) The chromosome of the heavy c 14 mi+ progeny resulting from crossover between the two markers. The density of this crossover product confirms that the crossover entailed a physical breakage and reunion of the DNA.

Chiasmata: the crossover points

In Chapter 5, we made the simple assumption that chiasmata are the actual sites of crossovers. Mapping analysis indirectly supports this idea: because an average of one crossover per meiosis produces 50 genetic map units, there should be correlation between the size of the genetic map of a chromosome and the observed mean number of chiasmata per meiosis. The correlation has been made in well-mapped organisms. However, the harlequin chromosome-staining technique (see Chapter 8) has made it possible to test the idea directly. In 1978, C. Tease and G. H. Jones prepared harlequin chromosomes in meioses of the locust. Remember that the harlequin technique produces sister chromatids: one dark and the other light. When a crossover occurs, it can be between two dark, two light, or nonsister dark and light chromatids, as shown in Figure 8-16. This last situation is crucial because mixed (part dark and part light) crossover chromatids are produced. Tease and Jones found that the dark light transition is right at the chiasmaproving that the chiasmata are the crossover sites and settling a question that had been unresolved since the early 1900s (Figure 19-3).

Genetic results leading to recombination models

Tetrad analysis in filamentous fungi, such as Neurospora crassa, where all four products of a single meiosis can be recovered and examined (see Chapter 6), provided

Mechanisms of Recombination the impetus for the first models of intragenic recombination. These crucial findings, reviewed in the following list, were gene conversion, evidence for postmeiotic segregation of gene-conversion events, polarity, and the association of gene conversion with crossing-over.
1. Gene conversion. Departures from the predicted Mendelian 4:4 segregation ratios are detectable
in some asci (0.11.0 percent in filamentous fungi, but as high as 4 percent in yeast). Figure 19-4 gives the most common aberrant ratios obtained. It appears as though some alleles in the cross have been converted into the opposite alleles (Figure 19-5). The process therefore has become known as gene conversion; it can occur only where there is heterozygosity for two different alleles of a gene. In asci with a 6:2 or 2:6 ratio, one entire chromatid of a chromosome seems to have converted. In asci with a 5:3 or 3:5 ratio, only half a chromatid seems to have converted. Here, different members of a spore pair have different genotypes. Recall that each spore pair is produced by mitosis from a single product of meiosis. The 5:3 or 3:5 ratios can be explained only by the two strands of the double helix carrying information for twoM different alleles at the conclusion of meiosis. The next mitotic division is therefore a postmeiotic segregation of alleles.

Conversion cannot be explained by mutation, because the allele that is converted always changes into the other specific allele taking part in the cross, not to some other allele known for the locus but not a part of the cross.

Figure 19-4. Rare aberrant allele ratios observed in a cross of type + m in fungi. (Ascus genotypes are represented here.) When the Mendelian ratio of 4:4 is not obtained, some of the alleles in the cross have been converted into the opposite alleles. In some asci, it appears that the entire chromatid has been converted (6:2 or 2:6 ratios). In others, it appears that only half-chromatids have been converted (5:3, 3:5, or

Mechanisms of Recombination

3:1:1:3 ratios).(c) Figure 19-5. Gene conversions are inferred from the patterns of alleles observed in asci. (a) In a chromatid conversion, the allele on one chromatid seems somehow to have been converted into an allele like those on the other chromatid pair. The converted allele is shown by the symbol . One spore pair is of the opposite genotype from that expected in Mendelian segregation. (b) In a half-chromatid conversion, one spore pair (*) has nonidentical alleles. Somehow, one chromatid seems to be half converted, giving rise to one spore of the original genotype and one spore converted into the other allele.

Figure 19-6. Diagram of chromatids taking part in a cross in regions I and II of a gene. The arrow indicates the polarity of gene conversion in the m locus, pointing toward the end with lower conversion frequency.

Figure 19-7. A specific ascus pattern can be explained by both crossover and a chromatid conversion. In this case, a conversion of m1 ; + is accomplished by a

Mechanisms of Recombination

crossover in the region between a and m 1 Figure 19-8. Sample ascus patterns obtained from a single conversion and co-conversions.

Holliday model
One of the first plausible models to account for the preceding observations was formulated by Robin Holliday. The key features of the Holliday model are the formation of heteroduplex DNA; the creation of a cross bridge; its migration along the two heteroduplex strands, termed branch migration; the occurrence of mismatch repair; and the subsequent resolution, or splicing, of the intermediate structure to yield different types of recombinant molecules. The model is depicted in Figure 19-9.
Enzymatic cleavage and the creation of heteroduplex DNA

Looking at Figure 19-9a, we can see that two homologous double helices are aligned, although note that they have been rotated so that the bottom strand of the first helix has the same polarity as the top strand of the second helix (5 3 in this case). Then a nuclease cleaves the two strands that have the same polarity (Figure 19-9b). The free ends leave their original complementary strands and undergo hydrogen bonding with the complementary strands in the homologous double helix (Figure 19-9c). Ligation produces the structure shown in Figure 19-9d. This partially heteroduplex double helix is a crucial intermediate in recombination, and has been termed the Holliday structure.
Branch migration

The Holliday structure creates a cross bridge, or branch, that can move, or migrate, along the heteroduplex (Figure 19-9d and e). This phenomenon of branch migration is a distinctive property of the Holliday structure. Figure 19-10 portrays a more realistic view of this structure as it might appear during branch migration.
Resolution of the Holliday structure

Mechanisms of Recombination The Holliday structure can be resolved by cutting and ligating either the two originally exchanged strands (Figure 19-9f, left) or the originally unexchanged strands (Figure 19-9f, right). The former generates a pair of duplexes that are parental, except for a stretch in the middle containing one strand from each parent. If the two parents had different alleles in this stretch, as indicated here, then the DNA will be heteroduplex. The latter resolution step generates two duplexes that are recombinant, with a stretch of heteroduplex DNA. The Holliday model also postulated that the heteroduplex DNA mismatches can be repaired by an enzymatic correction system that recognizes mismatches and excises the mismatched base from one of the two strands, filling in the excised base with the correct complementary base. The resulting molecules will carry either the wild-type or the mutant allele, depending on which allele is excised. Figure 19-11 demonstrates one way that we can easily visualize how the Holliday structure can be converted into the recombinant structures with which we are familiar. In Figure 19-11a, we can see the structure that we arrived at in Figure 19-4e drawn out in an extended form. Compare Figures 19-9e and 19-11a until you are convinced that these two structures are indeed equivalent. If we rotate the bottom part of this structure, as shown in Figure 19-11b, we can generate the form depicted in Figure 19-11c. This last form can be converted back into two unconnected double helices by enzymatically cleaving only two strands. As indicated in 19-11c, cleavage can occur in either of two ways, each of which generates a different product (Figure 19-11d). These cleaved structures can be viewed more simply (Figure 19-11e). Repair synthesis produces the final recombinant molecules (Figure 19-11f). Note the two different types of recombinants.
Application of the Holliday model to genetic crosses

The Holliday model nicely explained the phenomena that we described previously. Gene conversion and the aberrant ratios depicted in Figure 19-4 can result as a consequence of mismatch repair, as shown in Figure 19-12 and Table 19-1. In Table 19-1, the symbols + for wild type and m for mutant are used for simplicity. When both mismatches are corrected to yield the same parental type, then 6:2 or 2:6 ratios result; when only one heteroduplex is corrected, then a 5:3 ratio results; and, when there is no correction, an aberrant 4:4 ratio is the product. The Holliday model also accounted for polarity of gene-conversion events, because conversion takes place only within the heteroduplex DNA between the break point and the branch point at which the Holliday structure is resolved. The farther a gene locus is from the breakage point, the more likely it is to be beyond the branch point and thus not part of the heteroduplex. It should be noted that the phenomenon of gene conversion and its association with about half the cases of crossing-over was a driving force for the formulation of the Holliday model, which entails a strand exchange that results in reciprocal crossovers

Mechanisms of Recombination almost half the time. This 50 percent reciprocal crossover result is because of the resolution of the exchange point in two equally likely ways, as seen in Figure 19-9f, one of which produces crossing-over of markers outside the region of heteroduplex DNA. Coconversion is explained by the location of both sites in the region of heteroduplex DNA and by the excision of both sites in the same excision-repair act. This double excision converts both sites into the same parental type.
Meselson-Radding model

As the data from tetrad analyses accumulated, it became clear that the Holliday model could not explain everything. For instance, the two mismatches resulting from the two heteroduplexes (see Figures 19-9e and 19-12) should be manifested in the progeny from a cross, yielding aberrant 4:4 tetrads. Yet, tetrad analyses in yeast and other organisms showed that, whereas 6:2 tetrads were frequent among gene-conversion events, aberrant 4:4 tetrads were very rare. It seemed as if gene conversion and the formation of heteroduplex DNA occurred primarily in only one chromatid. The model proposed by Meselson and Radding (shown in Figure 19-13) generates the Holliday structure with one single-strand cut in only one chromosome (Figure 19-13a), in contrast with the Holliday model, in which a nick is made in one strand in each of the two homologous chromatids. This single-strand cut is followed by DNA synthesis (Figure 19-13b). After the nick, the displaced single strand invades the second duplex (Figure 19-13c), generating a loop, which is excised (Figure 19-13d). After ligation to produce a Holliday structure, followed by branch migration (Figure 19-13e), a heteroduplex is generated in each chromosome. Resolution of this intermediate (Figure 19-13f) occurs exactly as depicted in Figure 19-9 (or after rotation, as in Figure 19-11). Note the lack of symmetry in the heteroduplex DNA at resolution in Figure 19-13f (left), compared with Figures 19-9f and 19-11f. Thus, in the Meselson-Radding model (left side), the bottom chromatid duplex (in Figure 19-13) has a heteroduplex region, instead of the two chromatids having heteroduplex regions, as in the Holliday model. However, branch migration and isomerization can generate a structure that has heteroduplex regions on both duplexes (Figure 19-13, right side), which is required to explain aberrant 4:4 ratios (Table 19-1).
Double-strand break-repair model for recombination

In the Holliday and Meselson-Radding models for genetic recombination, the initiation events for recombination are single-strand nicks that result in the generation of heteroduplex DNA. However, the finding that yeast transformation is stimulated 1000-fold when a double-strand break is introduced into a circular donor plasmid provided the impetus for an additional model, the double-strand-break model shown in Figure 19-14. Originally formulated by Jack Szostak, Terry Orr-Weaver, and Rodney Rothstein, this model invokes double-strand breaks to initiate recombination. The breaks are enlarged to gaps, and the repair of the double-stranded gap results in gene conversion. The key features of this model are diagrammed in the steps in Figure 19-14: (1) a double-strand break, followed by digestion of the 5 end of both cut sites;

Mechanisms of Recombination (2) the invasion by a remaining 3 tail of the uncut other duplex; (3) the repair synthesis of one strand; (4) the repair synthesis of the other strand, and ligation to form two Holliday junctions; (5) resolution in one of two ways, one of which generates a reciprocal crossover; and (6) mismatch repair correction to yield gene conversion.

The phenomenon of gene conversion led to the development of heteroduplex models to explain the mechanism of crossing-over. Mendelian (1:1) allele ratios are normally observed in crosses because it is only rarely that a heterozygous locus is the precise point of chromosome exchange. Asci showing gene conversion at a typical heterozygous locus are relatively rare (on the order of 1 percent).

Visualization of recombination intermediates

Several of the individual steps that constitute the Holliday model have been demonstrated to occur in vivo or in vitro, such as nicking, strand displacement, branch migration, repair synthesis, and ligation. H. Potter and D. Dressler showed that DNA intermediates of the type predicted by the Holliday model can be found in recombining phages or plasmids. Figure 19-15 shows an electron micrograph of a recombinant molecule. It is formally equivalent to the central pair of DNA double helices shown in Figure 19-11, with two arms rotated to produce the central single-stranded dia-mond, as in Figure 19-11c, which runs between the sections of double-stranded DNA.

Mechanisms of Recombination

Figure 19-9. A prototype mechanism for genetic recombination. (a) Two homologous double helices are shown. Each pair represents a chromatid and the two pairs represent two nonsister chromatids. The helices are aligned so that the bottom strand of the first helix has the same polarity as the top strand of the second helix. (b) Two parallel or two antiparallel strands are cut. (c) The free ends become associated with the complementary strands in the homologous double helix. (d) Ligation creates partly heteroduplex double helices, the Holliday structure. (e) Migration of the branch point is by continued strand transfer by the two polynucleotide chains taking part in the

Mechanisms of Recombination crossover. (f) Resolution can occur in one of two ways, which will be described in detail later. (After H. Potter and D. Dressler, Cold Spring Harbor Symposium on Quantitative Biology 43, 1979, 970. Cold Spring Harbor Laboratory, Cold Spring

Harbor, NY.) Figure 19-10. Branch migration, the movement of the crossover point between DNA complexes. (After T. Broker, Journal of Molecular Biology 81, 1973, 1; from J. D. Watson et al., Molecular Biology of the Gene, 4th ed. Copyright 1987 by Benjamin

Mechanisms of Recombination


Mechanisms of Recombination

Figure 19-11. (a) The Holliday structure shown in an extended form. (b) The rotation of the structure shown in part a can yield the form depicted in part c. Resolution of the structure shown in part c can proceed in two ways, depending on the points of enzymatic cleavage, yielding the structures shown in part d. The dotted lines show which segments will rejoin to form recombinant strands for each particular cleavage scheme. The strands are shown linearly in part e and can be repaired to the forms shown in part f. (From H. Potter and D. Dressler, Cold Spring Harbor Symposium on Quantitative Biology 43, 1970, 970. Cold Spring Harbor Laboratory, Cold Spring

Harbor, NY.) Figure 19-12. When a mismatch is generated within a heteroduplex region, mismatch repair converts the mismatch into either the wild-type or the mutant sequence.

Genetic Consequences of the Meselson-Radding Model.

Mechanisms of Recombination

Figure 19-13. The Meselson-Radding heteroduplex model. (a) A duplex is cut on one chain. (b) DNA polymerase displaces one chain. (c) The resulting single chain displaces its counterpart in the homolog. (d) This displaced chain is enzymatically digested. (e) Ligation completes the formation of a Holliday junction, which is genetically asymmetric in that only one of the two duplexes has a region of potentially heteroduplex DNA. If the junction migrates, heteroduplex DNA can arise on both duplexes. (f) Resolution of the junction occurs as in the Holliday model. (From F. W. Stahl, The Holliday Junction on Its Thirtieth Anniversary, Genetics 138, 1994, 241246.)

Messelson-Radding Heteroduplex Model.

Mechanisms of Recombination

Mechanisms of Recombination

Figure 19-14. Double-strand-break model of meiotic recombination in the yeast S. cerevisiae. The two strands of each of two chromatids (in blue and yellow) are shown. The darker and lighter colors indicate complementary strands, as do the matching primes (for example, C and C or c and c for the allele designations). A key point of this model is shown in steps 3 and 4, which picture the generation of a Holliday structure with two crossover points. Two different ways of resolving this structure lead to the recombinant or nonrecombinant duplexes seen in step 5a and b. The red patch shows the heteroduplex mismatched regions, for the alleles d and D. Mismatch repair results in the conversion of d into D. (See Table 19-1 for the possible segregation patterns resulting from mismatch repair.) (From H. Lodish, D. Baltimore, A. Berk, S. L. Zipursky, P. Matsudaira, and J. Darnell, Molecular Cell Biology, 3d ed. Copyright 1995 by Scientific American Books.)

Enzymatic mechanism of recombination

By the isolation of mutants defective in some stage of recombination, much light has been shed on the enzymology of recombination. In E. coli, the products of several genes involved in general recombinationthe recA, recB, recC, and recD geneshave been well characterized, as has the single-strand-DNA-binding (Ssb) protein. Mutants deficient in any of these proteins have reduced levels of recombination. In fact, three distinct recombination pathways have been identified. In addition to the major RecBCD pathway, two minor pathways, RecF and RecE, are activated in certain situations. All three pathways use the RecA protein.
Production of single-stranded DNA

An initial step in recombination is probably the nicking and unwinding of a DNA duplex by a protein complex consisting of the RecB, RecC, and RecD proteins. The complex has both helicase and nuclease activity. Figure 19-16 shows how this complex unwinds the DNA, driven by the hydrolysis of ATP as it moves and generates single strands from a duplex molecule. The nuclease activity recognizes an 8-bp sequence:

called a chi site; these chi sites appear approximately every 64 kb. As the complex unwinds the DNA, the free single-stranded DNA can be used to initiate recombination. The Ssb protein, which also takes part in DNA replication (Chapter 8), can bind to and stabilize the single strands.

RecA-protein-mediated single-strand exchange

The RecA protein, which also plays a role in the induction of the SOS repair system (see Chapter 16), can bind to single strands along their length, forming a

Mechanisms of Recombination nucleoprotein filament. RecA catalyzes single-strand invasion of a duplex and subsequent displacement of the corresponding strand from the duplex. The invasion and displacement take place in the presence of ATP, as shown in Figure 19-17. The displaced strand forms what is termed a D loop. Figure 19-18 depicts how this sequence can lead to Holliday junction.
Branch migration

The movement of a Holliday junction (see, for instance, Figures 19-9 and 19-10), or branch migration, increases the length of heteroduplex DNA. The RuvA and RuvB proteins catalyze branch migration, driving the reaction by the hydrolysis of ATP. The RuvA protein binds to the crossover point and then is flanked by two RuvB ATPase hexameric rings, as seen by electron microscopy. Figure 19-19 depicts a model for the action of these proteins in branch migration.
Resolution of Holliday junctions

Several enzymatic pathways have been identified that cleave across the point of strand exchange in a Holliday structure to yield two duplexes. RuvC is an endonuclease that resolves Holliday junctions by symmetric cleavage of the continuous (noncrossing) pair of DNA strands, as seen in Figure 19-20. In addition, the RecG and Rus proteins may provide alternative routes to cleavage. These reactions are summarized in Figure 19-21.

Figure 19-16. Model for the generation of single-stranded DNA by the RecBCD complex (purple), an ATP-driven helicase and nuclease. Two single-stranded loops are formed as the complex moves forward. The enzyme cleaves one of the strands

Mechanisms of Recombination when it encounters a chi sequence (yellow) to form single-stranded DNA with a free end. Recombination can then take place. (After A. Taylor and G. R. Smith, Cell 22, 1980, 447; from L. Stryer, Biochemistry, 4th ed. Copyright 1995 by Lubert Stryer.)

Figure 19-17. The pairing of a single-stranded DNA molecule (ssDNA) with the complementary strand of a duplex is catalyzed by the recA protein. The resulting structure is called a D loop. ATP hydrolysis releases the recA protein from DNA. (From L. Stryer, Biochemistry, 4th ed. Copyright 1995 by Lubert Stryer.)

Figure 19-18. A schematic representation of some of the steps in recombination. (a) The pairing of two homologous duplexes. (b) A nick is made by the RecB,C nuclease. The helix is partly unwound, and the single-stranded region is extended and stabilized by the Ssb protein. The RecA protein catalyzes the invasion by the single strand of the duplex. The Ssb protein aids in keeping the single strand free. (c) After nicking by the RecB,C nuclease, the free single strand from the second duplex can anneal with the first duplex. (d) RNA ligase can seal this structure. (After B. Alberts et al., Molecular Biology of the Cell. Copyright 1983 by Garland Publishing.)

Mechanisms of Recombination

Figure 19-19. Model for RuvAB-mediated branch migration. The blue spheres indicate the RuvA protein, which binds to the crossover in the Holliday junctionRuvAB complex. Two hexameric rings of RuvB flank RuvA. The two RuvB ring motors lie in opposite orientation and affect branch migration by promoting the passage of DNA. (From Carol A. Parsons, Andrzej Stasiak, Richard J. Bennett, and Stephen C. West, Structures of a Multisubunit Complex That Promotes DNA Branch Migration, Nature 374, 1995, 377.)

Figure 19-20. Model showing RuvC cleavage of two of the four strands of an antiparallel Holliday junction. The scissors depict sites where the nicking is symmetric. (a) Here nicking resolves a stacked X-structure. (b) Nicking resolves an unfolded junction. A twofold-symmetric unfolded structure shown here is formed by a 180 rotation of arms I and II of the structure shown in part a. (From Richard J. Bennett and Stephen C. West, RuvC Protein Resolves Holliday Junctions via Cleavage of the Continuous (Noncrossover) Strands, Proceedings of the National Academy of Sciences USA 92, 1995, 5639.)

Mechanisms of Recombination

Figure 19-21. Possible pathways for processing Holliday junction in E. coli. The center of the figure shows a Holliday junction formed by homologous pairing and strand exchange by the RecA protein. In the presence of RecA, branch migration is in the same direction as the RecA strand exchange and is promoted by RuvAB. RuvC cleavage resolves this structure (upper right). Also in the presence of RecA, RecG could drive the Holliday junction backward (upper right). However, if RecA dissociates from the junction, RecG could drive the reaction forward, with resolution coming from the Rus protein (lower left). The lower right shows the results of Rus cleavage alone, after strand exchange and branch migration by RecA. (From Gary Sharples, Sau Chan, Akeel Mahdi, Matthew Whitby, and Robert Lloyd, Processing of Intermediates in Recombination and DNA Repair: Identification of a New Endonuclease That Specifically Cleaves Holliday Junctions, EMBO Journal 13, 1994, 6140.)

We have gained increasing knowledge of the molecular processes behind recombination, which produces new gene combinations by exchanging homologous chromosomes. Both genetic and physical evidence has led to several models of recombination that rely on common features: hetero-duplex DNA, mismatch repair, and resolution, or splicing, of the intermediate structure to yield recombinant molecules. The process of recombination itself is under genetic control, and numerous genes that affect the process have been identified.

Concept Map
Draw a concept map interrelating as many of the following terms as possible. Note that the terms are listed in no particular order.

Mechanisms of Recombination

Chapter Integration Problem

In previous chapters, we learned about different mechanisms of generating mutations, both spontaneously and with mutagens. Describe gene conversion and how you can distinguish gene conversion from mutation.
See answer


Gene conversion is a meiotic process of directed change in which one allele directs the conversion of a partner allele into its own form. Instead of ending with equal numbers of both alleles in meiosis, gene conversion results in an excess of one allele and a deficiency of the other. Models of recombination employing heteroduplex DNA offer plausible mechanisms for gene conversion. In contrast, mutation is undirected change that can occur in both meiosis and mitosis.

Solved Problems
1. In Neurospora, an ad3 double mutant consisted of two mutant sites within the ad3 genesite 1 on the left and site 2 on the right. This mutant was crossed to wild type, with the use of parental stocks heterozygous for two closely linked flanking loci, A and B, as follows, where 1 represents wild-type sequence at the mutant positions:

Most asci were of the expected type showing regular Mendelian segregations, but there were also some unexpected types, of which several examples are represented here as I through III.

Explain the likely origin of the rare types I through III according to molecular recombination models.
See answer


Ascus type I has two nonidentical sister spore pairs. Because the members of a spore

Mechanisms of Recombination pair are derived from a postmeiotic mitosis (see Chapter 6), this proves that the meiotic products in these cases must have contained both 1 2 and + + information; in other words, they must have contained heteroduplex DNA. Notice also that these spore pairs are recombinant for A and B. Therefore it is likely that a crossover occurred between the A and B loci, that the heteroduplex DNA that constituted the crossover spanned both the ad3 mutant sites 1 and 2, and that there was no correction of the heteroduplex at those sites. Type II shows a 5:3 ratio of 1 2 doubles to + +. Here again the heteroduplexes must have spanned both sites (in a noncrossover configuration), but this time there was correction of a + +/1 2 heteroduplex to 1 2/1 2, presumably by excision and repair of the + + information (co- or double conversion). Type III reveals another noncrossover heteroduplex configuration, and this time correction occurred only at site 1; this is revealed as a 5:3 ratio for site 1, with 1 + conversion in ascospore 5.
See question

2. In fungal crosses of the following general type,

where 1 and 2 are mutant sites of a nutritional gene and M and N are flanking loci, it is possible to select rare + + prototrophic recombinants by plating on minimal medium. These prototrophs are then examined for the alleles of the flanking loci. As might be expected, the combination

is commonly encountered, but so, somewhat surprisingly, are


a. Explain the origin of these two genotypes in relation to molecular recombination models. b. If M + + N were more common than m + + n, what would that mean?
See answer


Mechanisms of Recombination
a. It is likely that M + + N arose from a noncrossover heteroduplex that spanned site 2, followed by a correction of 2 +.

Similarly, m + + n would be explained by a heteroduplex spanning site 1, and corrected 1 +.

b. Because M + + N arises from gene conversion at the right-hand site, it is likely that
heteroduplex DNA is formed more commonly from the right than from the left, possibly because of a closer fixed break point. (Note: Prototrophs may also be formed by single-site correction in a heteroduplex spanning both sites, but then the inequality would require another explanation.)

1. Which of the following linear asci shows gene conversion at the arg2 locus?

See answer

3, 4, 6

2. At the light-spore locus of Ascobolus, the 1 mutant site is in the left part of the gene and the 1 mutant site is more to the right. When crosses are made between 1- and 1-bearing strains,

asci with six light and two black spores can be selected visually. They are shown to be caused by gene conversion mostly of the following type:

Mechanisms of Recombination

What can account for this irregularity?

3. It has been proposed that the fixed break point of the heteroduplex recombination model might correspond to a promoter sequence. The following data relate to this idea. In the fungus Podospora, mutants were available in adjacent spore color genes 1 through 4, and a cross was made as follows:

(The numbers 261, 136, 42, and 115 are merely names for mutant sites.) Many asci were obtained showing gene conversion, but one that was relevant to the preceding suggestion was as follows:

Interpret this ascus in relation to the promoter idea.

4. Many mutagens increase the frequency of sisterchromatid exchange. Give possible explanations for this observation. 5. Mutations in locus 46 of the Ascomycete fungus Ascobolus produce light-colored ascospores (let's call them a mutants). In the following crosses between different a mutants, asci are observed for the appearance of dark wild-type spores. In each cross, all such asci were of the genotypes indicated:

Mechanisms of Recombination

Interpret these results in light of the models discussed in this chapter. See answer First, notice that gene conversion has occurred. In the first cross, a1 converted (1 : 3). In the second cross, a3 converted. In the third cross, a3 converted. Polarity obviously played a part. The results can be explained by the following map, in which hybrid DNA enters only from the left.

6. In the cross A m m B a m b, the order of the mutant sites m , m , and m is 1 2 3 1 2 3 unknown in relation to one another and to A/a and B/b. One nonlinear conversion ascus is obtained:

Interpret this result in light of the heteroduplex DNA theory, and derive as much information as possible about the order of the sites.
7. In the cross a a (alleles of one locus) the following ascus is obtained: 1 2

Deduce what events may have produced this ascus (at the molecular level).
See answer The ratios for a1 and a2 are both 3 : 1. There is no evidence of polarity, which indicates that gene conversion as part of recombination occurred. The best explanation is that two separate excision-repair events took place and, in both cases, the repair retained the mutant rather than the wild type.

8. G. Leblon and J.-L. Rossignol made the following observations in Ascobolus. Single-nucleotide-pair insertion or deletion mutations show gene conversions of the 6:2 or 2:6 type and only rarely of the 5:3, 3:5, or 3:1:1:3 type. Base-pair transition mutations show gene conversion of the 3:5, 5:3, or 3:1:1:3 type and only rarely of the 6:2 or 2:6 type.

Mechanisms of Recombination

a. In relation to the hybrid DNA model, propose an explanation for these observations. b. Leblon and Rossignol also showed that there are far fewer 6:2 than 2:6 conversions for
insertions and far more 6:2 than 2:6 conversions for deletions (where the ratios are +:m). Explain these results in relation to heteroduplex DNA. (You might also think about the excision of thymine photodimers.)

c. Finally, the researchers showed that, when a frameshift mutation is combined in a meiosis
with a transition mutation at the same locus in a cis configuration, the asci showing joint conversion are all 6:2 or 2:6 for both sites (that is, the frameshift conversion pattern seems to have imposed its will on the transition site). Propose an explanation for this result.

a. and b. A heteroduplex that contains an unequal number of bases in the See answer two strands has a larger distortion than does a simple mismatch. Therefore, the former would be more likely to be repaired. For such a case, both heteroduplex molecules are repaired (leading to 6 : 2 and 2 : 6) more often than one (leading to 5 : 3 or 3 : 5) or none (leading to 3 : 1 : 1 : 3). The preference in direction (that is, the addition rather than the subtraction of a base) is analogous to thymine dimer repair. In thymine dimer repair, the unpaired, bulged nucleotides are treated as correct and the strand with the thymine dimer is excised.

A mismatch more often than not escapes repair, leading to a 3 : 1 : 1 : 3 ascus. Transition mutations would not cause as large a distortion of the helix, and each strand of the heteroduplex should have an equal chance of repair. This would lead to 4 : 4 (two repairs each in the opposite direction), 5 : 3 (one repair), 3 : 1 : 1 : 3 (no repairs or two repairs in opposite directions), and, less frequently, 6 : 2 (two repairs in the same direction).
c. Because excision repair excises the strand opposite the larger buckle (that is, opposite the
frameshift mutation), the cis transition mutation also is retained. The nearby genes are converted because of the length of the excision repair.

9. At the gray locus in the Ascomycete fungus Sordaria, the cross + g is made. In 1 this cross, heteroduplex DNA sometimes extends across the site of heterozygosity, and two heteroduplex DNA molecules are formed (as discussed in this chapter). However, correction of heteroduplex DNA is not 100 percent efficient. In fact, 30 percent of all heteroduplex DNA is not corrected at all, whereas 50 percent is corrected to + and 20 percent is corrected to g1. What proportion of aberrant-ratio asci will be (a) 6:2? (b) 2:6? (c) 3:1:1:3? (d) 5:3? (e) 3:5?
See answer

(a). 6 : 2 = 31.25 percent;

(b). 2 : 6 = 5 percent; (c). 3 : 1 : 1 : 3 = 11.25 percent; (d). 5 : 3 = 37.5 percent; (e). 3 : 5 = 15 percent.

Mechanisms of Recombination

10. Noreen Murray crossed and , two alleles of the me-2 locus in Neurospora. Included in the cross were two markers, trp and pan, which each flank me-2 at a distance of 5 m.u. The ascospores were plated onto a medium containing tryptophan and pantothenate but no methionine. The methionine prototrophs that grew were isolated and scored for the flanking markers, yielding the results shown in the table below.

Interpret these results in light of the models presented in this chapter. Be sure to account for the asymmetries in the classes.
11. In Neurospora, the cross A x a y is made, in which x and y are alleles of the his-1 locus and A and a are mating-type alleles. The recombinant frequency between the his-1 alleles is measured by the prototroph frequency when ascospores are plated on a medium lacking histidine; the recombinant frequency is measured as 105. Progeny of parental genotype are backcrossed to the parents, with the following results. All a y progeny backcrossed to the A x parent show prototroph frequencies of 105. When A x progeny are backcrossed to the a y parent, two prototroph frequencies are obtained: half of the crosses show 105, but the other half show the much higher frequency of 102. Propose an explanation for these results, and describe a research program to test your hypothesis. (Note: Intragenic recombination is a meiotic function that occurs in a diploid cell. Thus, even though this organism is haploid, dominance and recessiveness could have roles in this problem.)