Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Standard view
Full view
of .
Look up keyword
Like this
0 of .
Results for:
No results containing your search query
P. 1


Ratings: (0)|Views: 4|Likes:
Published by krishy19
gene duplication
gene duplication

More info:

Categories:Types, Research, Science
Published by: krishy19 on Nov 07, 2011
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less





Evolution by gene duplication: an update
Jianzhi Zhang
Department of Ecology and Evolutionary Biology, University of Michigan, 3003 Nat. Sci. Bldg, 830 N. University Ave, Ann Arbor,MI 48109, USA
The importance of gene duplication in supplying rawgenetic material to biological evolution has been recog-nized since the 1930s. Recent genomic sequence dataprovide substantial evidence for the abundance ofduplicated genes in all organisms surveyed. But how donewly duplicated genes survive and acquire novel func-tions, and what role does gene duplication play in theevolution of genomes and organisms? Detailed molecu-lar characterization of individual gene families, compu-tational analysis of genomic sequences and populationgenetic modeling can all be used to help us uncover themechanisms behind the evolution by gene duplication.
In 1936, Bridges reported one of the earliest observationsof gene duplication from the doubling of a chromosomalband in a mutant of the fruit fly
Drosophila melanogaster
,which exhibited extreme reduction in eye size[1]. Thepotential role of gene duplication in evolution was subse-quentlysuggestedandpossiblescenariosofduplicategeneevolution were proposed[2–4]. Ohno’s seminal book in1970,
Evolution by Gene Duplication
[5], further popular-ized this idea among biologists. It was, however, not untilthe late 1990s, when many genome sequences were deter-mined and analyzed, that the prevalence and importanceof gene duplication was clearly demonstrated. Throughgenomic sequence analysis, population genetic modelingand molecular experimentation, rapid progress has alsobeen made in disclosing the mechanisms by which dupli-cate genes diverge in function and contribute to evolution.Here, I review current understandings of these mechan-isms. I do not discuss genome duplication, as there havebeen several recent reviews of this topic[6–8].
Table 1lists the estimated numbers of duplicated genes incompletely or nearly completely sequenced genomes of representative bacteria, archaebacteria and eukaryotes.One finds that, in all three domains of life, large pro-portions of genes were generated by gene duplication. It isalmost certain that these proportions are underestimates,because many duplicated genes have diverged so muchthat virtually no sequence similarity is found.LynchandConeryestimatedthatgeneduplicationarises(and is fixed in populations) at an approximate rate of 1gene
Mus musculus, D. melanogaster, Caenorhabditis elegans,Arabidopsisthaliana
[9].Thisrateiscomparabletothatofnucleotidesubstitution,which is 0.1–0.5 site
100 MY 
in nuclear genomes of  vertebrates[10].Theaboveduplicationrateisthegene-birthrate,whichwasderivedfromrecentduplications.Manyfixedduplicated genes later become
(see Glossary)and are deleted from the genome. The rate of duplicationthat gives rise to stably maintained genes is the birth ratemultiplied by the retention rate, which is expected tofluctuate with gene function, among other things.Duplicated genes are often referred to as paralogousgenes, which form gene families. Several authors havetabulated the distribution of gene family size for a fewcompletely sequenced genomes[11,12]and this variessubstantially among species and gene families[13]; forinstance, the biggest gene family in
is the
Concerted evolution:
a mode of gene family evolution in which members of afamily remain similar in sequence and function because of frequent geneconversion and/or unequal crossing over.
Gene conversion:
a recombination process that nonreciprocally homogenizesgene sequences.
Nonsynonymous (nucleotide substitution):
a nucleotide substitution in thecoding region of a gene that changes the protein sequence.
Positive (darwinian) selection:
natural selection that promotes the fixation of advantageous alleles.
a DNA sequence derived from a functional gene but has beenrendered nonfunctional by mutations.
Purifying selection:
natural selection that prevents the fixation of deleteriousalleles.
a unit of gene expression and regulation, including structural genesand control elements.
anucleotidesubstitutioninthecodingregion of a gene that does not change the protein sequence.
Table 1. Prevalence of gene duplication in all three domains oflife
Totalnumberof genesNumber of duplicategenes (% ofduplicate genes) RefsBacteria
Mycoplasma pneumoniae 
677 298 (44)[65]
Helicobacter pylori 
1590 266 (17)[66]
Haemophilus influenzae 
1709 284 (17)[67]
Archaeoglobus fulgidus 
2436 719 (30)[68]
Saccharomyces cerevisiae 
6241 1858 (30)[67]
Caenorhabditis elegans 
18 424 8971 (49)[67]
Drosophila melanogaster 
13 601 5536 (41)[67]
Arabidopsis thaliana 
25 498 16 574 (65)[69]
Homo sapiens 
40 580
15 343 (38)[11]
Use of different computational methods or criteria results in slightly differentestimates of the number of duplicated genes[12].
The most recent estimate is
30 000[61].
Corresponding author:
Jianzhi Zhang (jianzhi@umich.edu).
TRENDS in Ecology and Evolution 
Vol.18 No.6 June 2003292
http://tree.trends.com0169-5347/03/$ - see front matter
2003 Elsevier Science Ltd. All rights reserved. doi:10.1016/S0169-5347(03)00033-8
trypsin family[12],with 111 members, whereas the biggestfamilyinmammalsistheolfactoryreceptorfamily,with
1000 members[14,15]. From a genomic sequenceanalysis of the bacterium
Escherichia coli
, two yeasts,
,ConantandWagnerfoundthat ribosomal proteins and transcription factors gener-ally form smaller gene families than do other proteins,such as those controlling cell cycles and metabolism[16].
Generation of duplicate genes
Gene duplication can result from unequal crossing over(Fig. 1a), retroposition (Fig. 1b), or chromosomal (or genome) duplication, the outcomes of which are quitedifferent.Unequalcrossingoverusuallygeneratestandemgene duplication; that is, duplicated genes are linked in achromosome (Fig. 1a). Depending on the position of crossing over, the duplicated region can contain part of agene, an entire gene, or several genes. In the latter twocases, introns, if present in the original genes, will also bepresent in the duplicated genes. This is in sharp contrastto the result from retroposition (Fig. 1b). Retropositionoccurs when a message RNA (mRNA) is retrotranscribedto complementary DNA (cDNA) and then inserted into thegenome. As expected from this process, there are severalmolecular features of retroposition: loss of introns andregulatory sequences, presence of poly A tracts, andpresence of flanking short direct repeats, althoughdeviations from these common patterns do occasionallyoccur[17]. Anothermajordifferencefromunequalcrossing over is that a duplicated gene generated by retroposition isusuallyunlinkedtotheoriginal gene,becausetheinsertionof cDNA into the genome is more or less random. It is alsoimpossible to have blocks of genes duplicated together byretropositionunlessthegenesinvolvedareallinan
.Only those genes that are expressed in the germ line aresubject to heritable retroposition. Because promoter andregulatorysequencesofagenearenottranscribedandhencenotduplicatedbyretroposition,theresultingduplicateoftenlacks necessary elements for transcription and thusimmediately becomes a pseudogene. Nevertheless, severalretroposition-mediatedduplicategenesareexpressed,prob-ablybecauseofthechanceinsertionofcDNAintoagenomiclocation that is downstream of a promoter sequence[17].Chromosomal or genome duplication occurs probably by alackofdisjunctionamongdaughterchromosomesafterDNA replication. Substantial evidence shows that these large-scale duplications occurred frequently in plants butinfrequently in animals[10]. Recent human genomeanalysis reveals another type of large-scale duplication,segmental duplication, which often involves 1000 to
200 000 nucleotides[18]. That most segmental dupli-cations do not generate tandem repeats suggests thatunequal crossing over is probably not responsible,although the exact duplication mechanism is unclear[18].
Evolutionary fate of duplicate genes
Duplicationoccursinanindividual,andcanbefixedorlostin the population, similar to a point mutation. If a newallele comprising duplicate genes is selectively neutral,compared with pre-existing alleles, it only has a smallprobability,1/2
istheeffectivepopulationsize.Thissuggeststhatmanyduplicatedgeneswillbelost.Forthosethatdobecomefixed, fixation is time consuming, because it takes, onaverage,4
generationsfora neutral alleletobecomefixed[19]. Upon fixation, the long-term evolutionary fate of duplication will still be determined by functions of theduplicate genes. The birth and death of genes is a commontheme in gene family and genome evolution[20,21], withthose genes involved in the physiologies that vary greatlyamong species (e.g. immunity, reproduction and sensorysystems)probablyhavinghighratesofgenebirthanddeath.
Gene duplication generates functional redundancy, as it isoften not advantageous to have two identical genes. Inother words, mutations disrupting the structure andfunction of one of the two genes are not deleterious andare not removed by selection. Gradually, the mutation-containing gene becomes a pseudogene, which is eitherunexpressed or functionless, an evolutionary fate that hasbeenshownby populationgenetic modeling[22,23]aswellas by genomic analysis[9,24]. After a long time evolutio-narily speaking, pseudogenes will either be deleted fromthegenome or become so divergedfrom theparental genesthat they are no longer identifiable. Relatively youngpseudogenes are recognizable because of sequence simi-larity.Forexample,genomicanalyseshaveidentified2168pseudogenes in
C. elegans
, or about one pseudogene forevery eight functional genes[25]. More pseudogenes existin humans, with about one pseudogene for every twofunctional genes in the two completely sequenced chromo-somes[24]. Pseudogenization, the process by which afunctional gene becomes a pseudogene, usually occurs in
Fig. 1
. Two common modes of gene duplication. (a) Unequal crossing over, whichresultsin arecombinationevent inwhich thetworecombiningsiteslieatnonidenticallocations in the two parental DNA molecules. (b) Retroposition, which occurs when amessage RNA (mRNA) is retrotranscribed to complementary DNA (cDNA) and theninserted into the genome. Squares represent exons and bold lines represent introns.
TRENDS in Ecology & Evolution 
AAAAAAATranscription and RNA splicing ReversetranscriptioncDNA MaturemRNARandominsertion intothe genomeUnequal crossing overIntron sequences arespliced out duringmRNA maturationThe parental generesides in a differentchromosome 
TRENDS in Ecology and Evolution 
Vol.18 No.6 June 2003293
the first few million years after duplication if the dupli-cated gene is not under any selection[9].Nevertheless, someduplicatedgeneshadbeenmaintainedinthegenomefor a long time for specific functions, before recentlybecoming pseudogenes because of the relaxation of func-tional constraints. For example, the size of the olfactoryreceptor gene family (
1000) is similar in humans andmice, but the percentage of pseudogenes is
60% inhumans and only 20% in mice. Many olfactory receptorgenes have become pseudogenes since the origin of hominoids[26]. This is probably related to the reduceduse of olfaction in hominoids, which can be compensatedfor by other sensory mechanisms, such as better vision.Pseudogenes do occasionally serve some function. Inchickens, there is only one functional gene (
) encodingthe heavy chain variable region of immunoglobulins, andimmunoglobulindiversityisgeneratedby
of the
gene by the many duplicated variable regionpseudogenesthatoccuronits5
side[27]. Althoughunlikely, pseudogenes can also be revived. In cows, the pancreaticribonuclease gene has a paralogous gene called the seminalribonuclease gene, which is expressed in semen. These twogenes are the result of gene duplication that occurred beforethe radiation of ruminants at least 35 MYago. In all otherruminants, the seminal ribonuclease gene either containsdeleterious mutations or is not expressed[28–30], whichsuggests that the seminal ribonuclease gene had been apseudogeneformuchofitshistory,butwasrevivedrecentlyinthe cow. How this could have happened is unclear.In my view, there have not been sufficient studiesof pseudogenization probably because pseudogenes areregarded to be uninteresting. In fact, lineage-specificpseudogenization, such as the aforementioned exampleof olfactory receptor genes of hominoids, provides richinformation about organismal evolution.
Conservation of gene function 
The presence of duplicate genes is sometimes beneficialsimply because extra amounts of protein or RNA productsare provided. This applies mainly to strongly expressedgenes the products of which are in high demand, such asrRNAs and histones. How can two paralogous genesmaintain the same function after duplication? One way isby gene conversion. Under frequent gene conversion, twoparalogous genes will have very similar sequencesand functions, and this mode of evolution is often referredto as
[10]. Alternatively, strong
against mutations that modify genefunction can also prevent duplicated genes from diverging.Purifying selection can be distinguished from gene conver-sionbyanexaminationofsynonymous(orsilent)nucleotidedifferences among duplicated genes. Synonymous differ-ences are more or less immune to selection and cannot bereduced by purifying selection. But they can be removed bygene conversion, because gene conversion homogenizesDNA sequences regardless of whether the differences aresynonymous or nonsynonymous (amino-acid-altering).Using this strategy, Nei and his associates re-examinedseveral large gene families that were previously thought tobe under concerted evolution. Their results suggest thatpurifying selection is much more important than is geneconversion in maintaining common functions of theseduplicated genes[21,31]. A recent population geneticanalysis also suggested that the conditions for gene conver-sionto befavoredselectively arerelatively restrictive[32].
Unless the presence of an extra amount of gene productis advantageous, two genes with identical functions areunlikely to be stably maintained in the genome[33].Theoretical population genetics predicates that bothduplicates can be stably maintained when they differ insome aspects of their functions[33],which can occur by subfunctionalization, in which each daughter gene adoptspart of the functions of their parental gene[34–36]. Oneform of subfunctionalization that is potentially importantin the evolution of development is division of geneexpression after duplication ([37],Fig. 2). Several dupli- categeneshavebeendemonstratedtoevolvefollowingthismodel of subfunctionalization[37]. For example, zebrafish
are a pair of transcriptionfactor genes generated by a chromosomal segmentalduplication that occurred in the lineage of ray-finnedfish. Zebrafish
is expressed in the pectoralappendage bud, whereas
is expressed in aspecific set of neurons in the hindbrain/spinal cord[37].Thesole
geneofthemouse,orthologoustobothgenes of the zebrafish, is expressed in both pectoralappendage bud and hindbrain/spinal cord. Changes of gene expression after gene duplication appear to be ageneral rule rather than exception[38,39]and thesechanges often occur quickly after gene duplication[39].Subfunctionalization can also occur at the protein func-tion level and can lead to functional specialization whenone of the duplicate genes becomes better at performingone of the original functions of the progenitor gene[40]. A recent study illustrates how a specialized digestive
Fig. 2
. Division of expression after gene duplication. Squares represent genes,closed ovals represent
-acting elements that regulate gene transcription, andopen ovals represent deactivated
-elements. Consider a gene that is expressed intissues T1 and T2, with a
-acting regulatory element A1 controlling the expressionin T1 and A2 controlling the expression in T2. Following gene duplication, onedaughter gene might lose the A1 element whereas the other gene might lose A2,so that each is expressed in only one of the two tissues. Under such conditions,both genes are necessary and therefore will be maintained in the genome.
TRENDS in Ecology & Evolution 
Gene duplicationComplementarydegenerate mutationsExpressed intissues T1 and T2A1 A2A1 A2Expressed in T2A1 A2Expressed in T1
TRENDS in Ecology and Evolution 
Vol.18 No.6 June 2003294

You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->