Evolution by gene duplication: an update
Department of Ecology and Evolutionary Biology, University of Michigan, 3003 Nat. Sci. Bldg, 830 N. University Ave, Ann Arbor,MI 48109, USA
The importance of gene duplication in supplying rawgenetic material to biological evolution has been recog-nized since the 1930s. Recent genomic sequence dataprovide substantial evidence for the abundance ofduplicated genes in all organisms surveyed. But how donewly duplicated genes survive and acquire novel func-tions, and what role does gene duplication play in theevolution of genomes and organisms? Detailed molecu-lar characterization of individual gene families, compu-tational analysis of genomic sequences and populationgenetic modeling can all be used to help us uncover themechanisms behind the evolution by gene duplication.
In 1936, Bridges reported one of the earliest observationsof gene duplication from the doubling of a chromosomalband in a mutant of the fruit ﬂy
,which exhibited extreme reduction in eye size. Thepotential role of gene duplication in evolution was subse-quentlysuggestedandpossiblescenariosofduplicategeneevolution were proposed[2–4]. Ohno’s seminal book in1970,
Evolution by Gene Duplication
, further popular-ized this idea among biologists. It was, however, not untilthe late 1990s, when many genome sequences were deter-mined and analyzed, that the prevalence and importanceof gene duplication was clearly demonstrated. Throughgenomic sequence analysis, population genetic modelingand molecular experimentation, rapid progress has alsobeen made in disclosing the mechanisms by which dupli-cate genes diverge in function and contribute to evolution.Here, I review current understandings of these mechan-isms. I do not discuss genome duplication, as there havebeen several recent reviews of this topic[6–8].
Table 1lists the estimated numbers of duplicated genes incompletely or nearly completely sequenced genomes of representative bacteria, archaebacteria and eukaryotes.One ﬁnds that, in all three domains of life, large pro-portions of genes were generated by gene duplication. It isalmost certain that these proportions are underestimates,because many duplicated genes have diverged so muchthat virtually no sequence similarity is found.LynchandConeryestimatedthatgeneduplicationarises(and is ﬁxed in populations) at an approximate rate of 1gene
Mus musculus, D. melanogaster, Caenorhabditis elegans,Arabidopsisthaliana
.Thisrateiscomparabletothatofnucleotidesubstitution,which is 0.1–0.5 site
in nuclear genomes of vertebrates.Theaboveduplicationrateisthegene-birthrate,whichwasderivedfromrecentduplications.Manyﬁxedduplicated genes later become
(see Glossary)and are deleted from the genome. The rate of duplicationthat gives rise to stably maintained genes is the birth ratemultiplied by the retention rate, which is expected toﬂuctuate with gene function, among other things.Duplicated genes are often referred to as paralogousgenes, which form gene families. Several authors havetabulated the distribution of gene family size for a fewcompletely sequenced genomes[11,12]and this variessubstantially among species and gene families; forinstance, the biggest gene family in
a mode of gene family evolution in which members of afamily remain similar in sequence and function because of frequent geneconversion and/or unequal crossing over.
a recombination process that nonreciprocally homogenizesgene sequences.
Nonsynonymous (nucleotide substitution):
a nucleotide substitution in thecoding region of a gene that changes the protein sequence.
Positive (darwinian) selection:
natural selection that promotes the ﬁxation of advantageous alleles.
a DNA sequence derived from a functional gene but has beenrendered nonfunctional by mutations.
natural selection that prevents the ﬁxation of deleteriousalleles.
a unit of gene expression and regulation, including structural genesand control elements.
anucleotidesubstitutioninthecodingregion of a gene that does not change the protein sequence.
Table 1. Prevalence of gene duplication in all three domains oflife
Totalnumberof genesNumber of duplicategenes (% ofduplicate genes) RefsBacteria
Use of different computational methods or criteria results in slightly differentestimates of the number of duplicated genes.
The most recent estimate is
Jianzhi Zhang (firstname.lastname@example.org).
TRENDS in Ecology and Evolution
Vol.18 No.6 June 2003292
2003 Elsevier Science Ltd. All rights reserved. doi:10.1016/S0169-5347(03)00033-8