Professional Documents
Culture Documents
BARCODING FUNGI
Blackwell Publishing Ltd
Abstract
Our study evaluated in silico the potential of 14 mitochondrial genes encoding the subunits
of the respiratory chain complexes, including cytochrome c oxidase I (CO1), as Basidiomycota
DNA barcode. Fifteen complete and partial mitochondrial genomes were recovered and char-
acterized in this study. Mitochondrial genes showed high values of molecular divergence,
indicating a potential for the resolution of lower-level relationships. However, numerous
introns occurred in CO1 as well as in six other genes, potentially interfering with polymerase
chain reaction amplification. Considering these results and given the minimal length of 600-bp
that is optimal for a fungal barcode, the genes encoding for the ATPase subunit 6, the cyto-
chrome oxidase subunit 3 and the NADH dehydrogenase subunit 6 have the most promising
characteristics for DNA barcoding among the mitochondrial genes studied. However, biolog-
ical validation on two fungal data sets indicated that no single mitochondrial gene gave a
better taxonomic resolution than the ITS, the region already widely used in fungal taxonomy.
Keywords: Basidiomycota, CO1, DNA barcoding, ITS, mitochondrial genes
Received 31 October 2008; revision received 26 January 2009; accepted 30 January 2009
(Bruns et al. 1991; Bridge et al. 2005). The ITS can easily be gene regions, alone or concatenated, for fungal taxonomy
amplified from many fungal taxa using a limited set of and systematics (Paquin et al. 1997; Grasso et al. 2006; Seifert
primers (White et al. 1990; Gardes & Bruns 1993). Further- et al. 2007). The CO1 gene was suitable as a barcode for dis-
more, based on the finding that differences among species criminating fungal species in Penicillium, a taxonomically
are generally consistently larger than those within species, challenging genus within ascomycetous fungi (Min & Hickey
ITS sequences have been frequently utilized for fungal taxa 2007; Seifert et al. 2007). Nevertheless, despite this specific
delimitation and identification (Bruns et al. 1991; Guarro et al. case study, the use of CO1-5′ as a DNA barcoding region for
1999; Bruns 2001; Katsu et al. 2004). This level of polymorph- delimiting closely related fungal species has not been exten-
ism makes the ITS a logical candidate for DNA barcoding of sively studied. Some characteristics of the fungal mito-
fungi (Zeng & De Hoog 2008). chondrial genome represent significant obstacles for the
However, utilization of this region for taxonomic and phylo- elaboration of a barcode. Copies of mtDNA genes (or gene
genetic purposes in fungi can present some limitations. regions) can occur in the nuclear genome [the so-called
On the one hand, a lack of variation among closely related nuclear mitochondrial DNA (NUMT)] (Burger et al. 2003;
species has been observed even when the species’ borders Richly & Leister 2004; Song et al. 2008) and in the mitochon-
were well defined and supported by other molecular, mor- drial genome (Martin et al. 2007). Group I or II introns are
phological and/or biological data. For example, molecular present in the mitochondrial genes with variable fre-
taxonomic studies have demonstrated the inefficiency of the quencies depending on the species sampled, with a strong
ITS to resolve well-characterized species in Heterobasidion, preference for protein-coding genes (Paquin et al. 1997;
Armillaria, Fusarium and Penicillium genera (Chillali et al. Burger et al. 2003; Yan & Xu 2005). Furthermore, functional
1998; Skouboe et al. 1999; Bruns 2001; Seifert et al. 2007). constraints, unequal substitution patterns and hetero-
Furthermore, in some cases, the ITS regions could not reveal geneous evolutionary rates found in mtDNA genes can
known cryptic species in fungal species complexes (Tian potentially affect its usefulness for species delimitation
et al. 2004; Seifert et al. 2007); the resolution of these taxo- (Paquin et al. 1997; Roe & Sperling 2007).
nomically challenging and economically important groups In this study, we present a bioinformatics approach that
should be one of the most important attributes of a fungal aims to evaluate the potential of 14 mitochondrial genes
DNA barcode. On the other hand, intraspecific and even commonly found in fungal mitochondrial genomes as DNA
intra-individual variations have been found in some fungal barcodes for the Basidiomycota. These genes encode hydro-
groups for these loci (O’Donnell & Cigelnik 1997; Aanen phobic subunits of the respiratory chain complexes I, III and
et al. 2001; Okabe & Matsumoto 2003; Lim et al. 2008). Such IV [including apocytochrome b (cob); cytochrome oxidase
polymorphisms in the ITS can either result from differences subunits 1, 2 and 3 (CO1 to CO3); NADH dehydrogenase
within nuclei (heterogeneity among repeats) or from differ- subunits 1, 2, 3, 4, 4L, 5 and 6 (nad1 to nad6, nad4L), and
ences between nuclei (dikaryotic and multinucleate fungi) ATPase subunits 6, 8 and 9 (atp6, atp8 and atp9)]. Our strat-
(Aanen et al. 2001; Okabe & Matsumoto 2003). Heterogeneity egy first consisted in carrying out an inventory of the pub-
among repeats creates complications for direct sequencing licly available mitochondrial sequence data for this group
of ITS- PCR products because of the occurrence of multiple of fungi to recover a maximum of complete mitochondrial
different ITS copies in a single fungal isolate (Aanen et al. genomes. We evaluated the structure of each coding gene
2001; Matheny et al. 2007) and could makes it difficult to to assess (i) intron position and size; (ii) the occurrence of
accurately define taxonomic groups at the species level partial or full length copies of the gene in both nuclear and
(Aanen et al. 2001; Smith et al. 2007). mtDNA; and (iii) the information provided in the context
Mitochondrial DNA (mtDNA) has a simple genetic struc- of barcoding. We then assessed and compared the poten-
ture, a limited exposure to genetic recombination, and rapid tial for species level taxonomic resolution with the resolu-
rates of evolution compared with nuclear DNA (Xu & Singh tion obtained for the ITS and 28S loci by sequencing 38
2005; Waugh 2007) and is thus well suited to design mole- strains within the Chrysomyxa and Melampsora genera
cular markers for the study of closely related taxa (Hebert et al. which represent taxonomically challenging groups in
2003). A 648-bp region at the 5′-end of the mitochondrial Pucciniomycotina.
gene encoding cytochrome c oxidase I (CO1-5′) has been
proposed as the core of global bio-identification systems for
Materials and methods
eukaryotes (Hebert et al. 2003; Hebert & Gregory 2005;
Waugh 2007). CO1-5′ appears to possess a greater potential
Recovery of mtDNA genome sequences and protein coding
as species-specific marker than any other mitochondrial gene
genes for Basidiomycota
as confirmed by taxonomic resolution (> 95%) in most
Metazoa groups, including Lepidoptera and birds (Hebert Three strategies were designed to recover the entire mito-
et al. 2003, 2004a, b; Cywinska et al. 2006). Promising results chondrial genome of 17 Basidiomycota species from various
have recently been published using mitochondrial coding public databases (Fig. 1). In the first strategy (Fig. 1-1), six
complete and annotated mtDNA genomes [Cryptococcus query. The third strategy (Fig. 1-3) consisted in the construc-
neoformans var. grubii (strain H99), Moniliophthora perniciosa, tion of a mitochondrial genome assembly for seven other
Pleurotus ostreatus, Schizophyllum commune, Tilletia indica and species (Laccaria bicolor (Martin et al. 2008), Phanerochaete
Ustilago maydis] were obtained from the National Center chrysosporium, Phakopsora pachyrhizi, Postia placenta, Sporidio-
for Biotechnology Information (NCBI) Genomes databases bolus salmonicolor, Sporobolomyces roseus and Ustilago hordei)
[http://www.ncbi.nlm.nih.gov/genomes/ORGANELLES/ using available whole-genome shotgun sequences (WGS)
fu.html], and two complete but non-annotated genomes recovered from Trace Archive databases at the NCBI website
[C. neoformans var. neoformans (strain JEC21) and Puccinia (http://www.ncbi.nlm.nih.gov/Traces/trace.cgi?). For the
graminis f. sp. tritici] were obtained from the Stanford Genome latter fungi, all traces were assembled using the Pcap.Rep
Technology Center and the Broad Institute websites, respec- program (Huang et al. 2006) with default parameters. Each
tively. The second strategy (Fig. 1-2) was aimed at recovering whole-genome assembly obtained was screened for potential
sequence contigs that correspond to the mitochondrial gen- mitochondrial contig(s) using the method described above
ome of Coprinus cinereus and Cryptococcus neoformans var. (strategy 2). If more than one single contig was retrieved, a
gattii (strain R265) within assembled contigs from whole new assembly was performed with the Cap3 program imple-
genomes. The available expressed sequence tags (ESTs) for mented with default parameters (Huang & Madan 1999).
C. cinereus (15 715 ESTs) were screened (tblastx, E-value In order to improve the comprehensiveness of this last mito-
threshold at 1e20) for the presence of mitochondrial ESTs chondrial genome assembly, an additional blastn search
using a set of mitochondrial protein coding genes as well as (E-value threshold at 1e20) was performed against the Trace
rRNA and tRNA of C. neoformans var. grubii (strain H99), Archive database using the 14 genes encoding subunits of
M. perniciosa, S. commune and U. maydis as queries. The the respiratory chain complexes of each complete and
identification of the corresponding mitochondrial contig annotated mtDNA genomes [C. neoformans (strain H99),
was performed by searching the C. cinereus whole-genome M. perniciosa, P. ostreatus, S. commune and U. maydis] as queries.
sequence using the tblastx program (E-value threshold All similar sequences retrieved by this search were appended
at 1e20) and submitting each EST recovered in the last to the previous assembly with one additional run of Cap3
search. Likewise, the corresponding mitochondrial contig of contig assembly (Fig. 1, step 4).
C. neoformans var. gattii (strain R265) was identified using In addition, 24 mitochondrial-protein encoding sequences
C. neoformans var. grubii (strain H99) mitochondrial genes as (complete CDS features) for nine strains of eight other
B A R C O D I N G F U N G I 103
CO2 (AF534130)
cob (AY560608)
nad4 L (AF538353)
Rhodotorula glutinis atp8 (AB248915)
atp9 (AB248915)
CO3 (AB248915)
nad5 (AB248915)
nad6 (AB248915)
Suillus luteus atp6 (AF002135)
Suillus sinuspaulianus CO3 (AF002136)
Trimorphomyces papilionaceus cob (X85236)
nad1 (X73821)
104 B A R C O D I N G F U N G I
Table 2 General features of mitochondrial genomes for the Basidiomycota considered in this study
No. of No of
Final putative putative
assembly No. of G+C No. of group-I group-II
Species size (nt) contigs content % rns§ rnl§ tRNA Rps3§ Total intron intron
*in addition to the 14 genes encoding the hydrophobic subunits of the respiratory chain complexes common to fungal mitochondrial
genomes CO1–2, cob, nad1–6, atp6–9
†incomplete mitochondrial genome sequence, some mitochondrial protein-coding genes not retrieved
‡genome recovered using the reconstruction strategies 2, 3 and 4 of the bioinformatic process described in this study (Fig. 1)
(...) missing data
§target gene (√) retrieved or (—) not retrieved
subunit (rps3) was identified in only 9 out of the 15 mito- contigs, for a total size of 114.1, 45.4 and 40.1 kb, respec-
chondrial genomes screened. The set of mitochondrial tRNAs tively (Table 2).
ranged from 3 tRNA genes for P. chrysosporium to 28 in the
M. perniciosa mitochondrial genome. All 14 genes encoding
Characterization of 14 genes encoding the subunits of the
the hydrophobic subunits of the respiratory chain complexes
respiratory chain complexes
could be localized in all studied genomes, except atp8 in
P. chrysosporium, CO2 in P. placenta and nad6 in P. pachyrhizi Half of the 14 genes encoding the subunits of the respiratory
and S. roseus. Similarly, genes encoding the 23S and 16S chain complexes (CO1 and CO2, cob, nad1, nad2, nad4 and
ribosomal RNAs of the large and small subunits of the ribo- nad5) contained insert locations, ranging from one (in nad2)
some (rnl and rns, respectively) were present in all the mtDNA to 18 (in CO1), for a total of 42 (Fig. 2). One hundred and
genomes investigated, except rnl absent in P. chrysosporium. twelve of these inserts were identified as probable group I
Such differences in gene patterns among different mito- introns, among which 97 contained putative open reading
chondrial genomes reflected either the absence of one specific frames (ORFs) encoding LAGLIDADG or GIY-YIG endonu-
gene in the mitochondrial genome, as previously reported clease proteins. Two group II introns were also found in this
for the atp9 gene in Podospora anserina (Yan & Xu 2005), or the data set. The insert retrieved in the nad4 gene of P. graminis
limitation of the data-mining and reconstruction methods contained one ORF which encodes a group II intron maturase.
applied to incomplete Trace Archive data sets. The mitochon- Likewise, based on secondary structure prediction, the 11th
drial genomes of P. placenta (CO2 and rps3 missing), S. roseus CO1 insert found in P. placenta was identified as a group-II
(nad6 and rps3 missing, atp9 incomplete) and P. chrysosporium intron. Interestingly, this last putative group-II intron poss-
(atp8, rnl and rps3 missing, CO1, CO2, CO3, cob, nad1 and esses an ORF which encodes a LAGLIDADG endonuclease
nad5 incomplete) are most likely incomplete since they identical to those found in group I introns. Such a family of
resulted from the concatenation of 4, 26 and 12 non-overlapping group-II introns, e.g. encoding LAGLIDADG ORFs typical
Fig. 2 Number, position and characterization of inserts in seven mitochondrial genes encoding subunits of the respiratory chain complexes.
B A R C O D I N G F U N G I 107
Table 3 Properties of the 14 genes encoding subunits of the respiratory chain complexes genes
of group-I introns has been observed in the rnl and rns genes were localized following our fungal genome searches
of Agrocybe aegerita and Trimorphomyces papilionaceus, respec- (Table 3). Size of these copies ranged from 20 nt to the full
tively (Toor & Zimmerly 2002). Patterns of presence/absence length of the query sequence. Two classes of copies were
for these ORFs resulted in significant length variations for distinguishable. The first class comprised entire (mostly
the different inserts found in those genes (from 200 bp for perfect) copies that matched almost 90% of the query sequ-
the S. roseus insert in cob to 4635 bp for the third C. neoformans ence length and exhibited up to 95% identity (blastn E-
(strain R265) insert in CO1). value of 0.0). Members of this category were rare and confined
As observed in other fungal mtDNA studies (Vaughn et al. to three species: C. neoformans (strain H99) (one complete
1995; Paquin et al. 1997), we confirmed the prevalence of copy of atp8, CO1, cob, nad1, nad2 and nad6), P. graminis (one
introns in the CO1 and cob genes. We furthermore report on complete copy of atp6, atp8, CO2 and nad3) and S. roseus (one
the wide distribution of large putative introns in other genes complete copy of atp6, three of CO2 and two of nad3 and
such as nad1, nad5 and CO2 (Fig. 2). This contrasts with an nad4L; Table 3). The six complete copies found in C. neofor-
earlier report that CO2, along with atp9 and nad6, rarely mans (strain H99) were distributed across four supercontigs
contained introns in fungal mtDNA (Paquin et al. 1997). (supercontig 1.124 for CO1 and atp8, 1.80 for nad1 and cob,
1.97 for nad2 and 1.76 for nad6). Strong similarities (99%)
were observed between these supercontigs and homologous
Analysis of mtDNA copies of the 14 mitochondrial genes
regions in the mitochondrial genome of C. neoformans strain
Over 1180 putative copies of the 14 genes encoding subunits H99. Similarly, all the copies found in P. graminis, except
of the respiratory chain complexes considered in this study nad3, were clustered in one single supercontig (2.15), but in
this case, only a small part of this supercontig (0.2% of the (± 0.23), and by percentage of variable sites (62.6% for atp9
total supercontig length) showed a strong homology (98%) and 77.4% for nad3; Fig. 3a). The substantial divergence
with the P. graminis mitochondrial genome. Given the strong values observed at higher taxonomic levels (e.g. between
identities observed in these fungi between these supercontigs genera within Basidiomycota) suggest that these 14 mito-
and homologous regions in their respective mitochondrial chondrial genes may be potentially useful markers for the
genomes, two hypotheses can be advocated. First, the pre- resolution of lower-level relationships (e.g. between species).
sence of copies identical to mtDNA sequences in nuclear Such high divergence levels were expected for mitochondrial
DNA can be attributed to recent DNA duplications and protein-coding genes (Moriyama & Powell 1997; Hebert
transfers from mitochondria to chromosomes (NUMTs). et al. 2003). As a comparison, at similar taxonomic levels
Such invasion of nuclear DNA by mtDNA appears as a (between genera within Basidiomycota), overall means of
continuous evolutionary process that results in the occurrence K2P distances observed for the ribosomal nuclear RNA
of complete rearranged and/or fragmented copies of variable genes (18S and 28S) were 0.12 (± 0.05) and 0.25 (± 0.05),
in sizes and evenly distributed within and among chromo- respectively. In contrast, numerous insertion/deletion in
somes (Richly & Leister 2004; Bullerwell & Lang 2005; Pamilo the ITS sequences obtained for the same taxa prevented
et al. 2007). Examples of recent transfers have been observed accurate nucleotidic alignment and determination of the
in the ascomycete yeast Saccharomyces cerevisiae (Ricchetti molecular evolution pattern.
et al. 1999) and in other nuclear genomes from eukaryotes, We used likelihood-ratio tests to find the best evolution-
such as plants and human (Mourrier et al. 2001; Stupar et al. ary model fitting each of the genes under examination. For
2001; IRGSP 2005). Second, mis- or unassembled parts of the each gene, evolutionary models that accounted for unequal
mitochondrial genome sequence can be retrieved in the com- base frequencies provided a significantly better fit to the
plete genome assembly. These technical problems might have data. Base compositional bias was lower for atp9 and CO1
arisen during the assembly of a whole genome sequence, (observed G + C content of 40% and 37%, respectively) rela-
either from insufficient read quality or from the presence of tive to the 12 other mitochondrial genes examined, which
nucleotide repeats. In fact, alignment errors (computational exhibited a mean G + C content of 30% (± 3%). At this level,
or clone-induced from chimera) resulting in genome region such A-T bias may result in an increase of homoplasy
misassemblies and producing these copies cannot be (Lockhart et al. 1994; Foster & Hickey 1999; Rokas et al. 2002).
completely excluded. Similar computational errors should Limitations in the type of changes at several nucleotidic sites
have been generated in the S. roseus mitochondrial genome induced by homoplasy could result in asymmetrical patterns
assembly since perfect copies of the entire CO2, nad3 and of among-base substitution rates (Collins et al. 1994). Only
nad4L genes were retrieved in this genome. the 18S, 28S and atp8 genes fitted relatively simple models
The second class of copies included short repetitive ele- assuming two [Tamura–Nei (TrN) model for the 18S and
ments scattered throughout mitochondrial genomes. The 28S genes] or three substitution rates [transitional model
mitochondrial genomes of C. cinereus, L. bicolor, M. perniciosa (TIM) for atp8]. The 13 remaining genes fitted transversional
and P. placenta contained such elements that originated from (TVM) and general time reversible (GTR) models, which com-
the CO1 and 2, cob and nad1 and nad5 genes (Table 3). Most prise five and six classes of substitution types, respectively
of these sequences (six in P. placenta, 34 in M. perniciosa and (Fig. 3b). The complexity of these models was furthermore
62 in L. bicolor) of 20–92 nt were repeated from 15 to more emphasized by a parameter accounting for the rate of
than 30 times and exhibited short hairpin structures. Such heterogeneity across sites. In the five cases in which no
secondary structures have previously been reported in mito- invariable site parameter (I) was added to the ML model
chondrial fungal genomes and are described as double- (atp8, apt9, nad3, nad4L and 28S; Fig. 3b), the shape parameter
hairpin elements (DHE) (Paquin et al. 1997). (alpha) of the gamma-distributed rates component was low,
denoting a strong rate of heterogeneity among sites (e.g. a
more uneven distribution of rates among sites). We noted
Patterns of evolution
a significant correlation between the alpha and I parameters
To better understand the pattern and rate of evolution of (r2 = 0.74; P < 0.05), which likely resulted from the fact that
the mitochondrial loci, we assessed sequence divergence more sites were allocated to the invariant site category, the
levels between 14 ingroup taxa (C. cinereus, C. neoformans remaining sites showed a lesser rate of heterogeneity. Low
(strain JEC21), C. neoformans (strain H99), C. neoformans (strain alpha values correspond to genes with a few sites evolving
R265), L. bicolor, M. perniciosa, P. pachyrhizi, P. ostreatus, P. at a very high rate, with the remaining sites changing at a
placenta, P. graminis, S. commune, S. roseus, T. indica, and U. very slow rate. Thus, given these model parameter tend-
maydis) for each of the 14 mitochondrial genes encoding the encies (e.g. base composition heterogeneity, substitution bias
subunits of the respiratory chain complexes. We found high and low alpha values), the substantial variation observed
interspecific mutational variation in each gene as measured in these genes appears to be concentrated at a few sites.
by K2P, with distances ranging from 0.36 (± 0.11) to 0.68 These sites are likely to have multiple substitutions with a
Fig. 3 Attributes of 14 genes encoding subunits of the mitochondrial respiratory chain complexes. (a) Kimura-2-parameter (K2P)
interspecific distances between pairwise sequences with the standard deviation plotted against the percentage of variable sites found for
each gene set; lines below K2P values indicate that the means were not significantly differentiated by a Tukey test. (b) Gamma-shape
parameters, proportion of invariable sites, and among-bases substitution rates components of the best-fitting models for the 14 genes
considered. The classes of substitution types varied according to the colours depicted in the bottom right box. The upper right box contains
the lengths of the different data sets.
resulting reduction in the number of variable and/or Moncalvo, unpublished). However, the use of this gene in
alternative character states available. This translates into Ascomycota proved to be problematic and was abandoned
an increased sensitivity to homoplasy (Cummings et al. in the Assembling the Fungal Tree of Life (AFTOL) initiative
1995; Ballard & Whitlock 2004). Such mutational saturation (V. Hofstetter, personal communication to J.M.M.). Multiple
tendency raised the possibility that convergence in base divergent atp6 sequences were recovered from several lichen
composition between unrelated taxa could lead to incorrect strains. These divergent atp6 sequences were hypothesized
species delimitations (Roe & Sperling 2007). to originate from autonomously replicating plasmid-like
DNA containing the atp6 gene, as observed in the maize
pathogen Cochliobolus heterosporus (Lin et al. 1988; Hofstetter
Efficiency of the mitochondrial DNA barcode candidates
et al. 2004). The use of CO3 in Boletales was abandoned since
Considering the absence of intron in the in silico analysis it contained introns that interfered with PCR amplification
and the potential for high divergence levels, seven fungal (Kretzer & Bruns 1999). The NADH dehydrogenase subunit
mitochondrial genes, atp6, atp8, atp9, CO3, nad3, nad4L and genes are absent from the mitochondrion of several yeasts
nad6, had potential as DNA barcodes. With an optimal length (Ascomycota) (Bullerwell et al. 2003). In filamentous fungi,
for a barcode of approximately 600 nt (Min & Hickey 2007), few studies have considered nad6 as a tool for fungal system-
the set is reduced to three genes: atp6, CO3 and nad6. Atp6 atics and taxonomy. These studies emphasized the inade-
was successfully used for systematics in the Boletales quacy of individual mitochondrial genes to resolve species
(Agaricomycotina, Basidiomycota) (Kretzer & Bruns 1999) phylogenies (Kouvelis et al. 2004; Pantou et al. 2006). Mito-
and is promising for other groups of Agaricomycotina (J.M. chondrial DNA (especially the intergenic sequences of the
Table 4 Efficiency of the ITS, 28S and four mitochondrial loci as DNA barcodes
*A species is considered resolved if all of its constituent sequences form a monophyletic cluster and are distinct from other sequences.
NADH dehydrogenase subunit genes) was nevertheless The number of primer pairs required for successful PCR
considered as a valuable tool for the discrimination of amplifications and sequencing varied according to the locus
closely related species within Ascomycota (Kouvelis et al. and the data set (either Melampsora spp. or Chrysomyxa spp.)
2004; Pantou et al. 2006). targeted (Table 4). In general, PCR amplification results
Following our in silico results, we tested the efficiency of obtained from these two rust genera were congruent with
four of the 14 mitochondrial genes for DNA barcode. First, the results obtained in the in silico analyses. The PCR amplifi-
we tried to amplify the CO1-5′ region by PCR since this locus cation of a ~600-bp product from the 5′-end of the CO1 gene
was initially proposed as the universal barcode system for generally required more than a single primer pair due to the
eukaryotes. Then, we compared the efficiency of the nad6, occurrence of numerous large introns. In contrast, no intronic
CO3 and atp6 genes with the 28S, ITS and CO1-5′ loci. To regions were found in the nad6 and CO3 genes among the
assess the potential of these loci as DNA barcodes, we 15 fungi considered in the preliminarily in silico analysis,
generated a data set for two fungal genera with taxonomic and successful PCR amplifications of these genes were
difficulties (Table S1). The species complex Chrysomyxa ledi obtained using only one single primer pair for the Melamp-
de Bary includes several cryptic species. At least six of them sora and Chrysomyxa strains considered here. The amplifi-
are distinguishable by their spore morphometry and/or cation and sequencing of the ITS, CO1 and atp6 loci in the
uredinial host specificity (Crane 2001). A collection of Chrysomyxa data set was particularly complicated. Two pri-
Melampsora species sampled on aspen and white poplars mer pairs were required to obtain readable ITS sequences.
was also included in this study. This data set includes the Even using multiple primer pairs in different amplification
M. populnea species complex composed of at least four species reactions, a maximum of 292 bp was obtained for the CO1
distinguished through aecial host specificity, but morpho- gene for 29% of the specimens tested (Table 4). We ampli-
logically similar (Pei & Shang 2005). fied more than 650 bp of the atp6 gene in the Chrysomyxa
data set except for two species: C. ledi and C. rhododendri. This collection and analysis, decision to publish, or preparation of
failure might have occurred for these two species because the manuscript.
of (i) the occurrence of intron; (ii) the presence of poly-
morphisms at the primer sites, although these had been
References
designed in a conserved part of the atp6 gene.
Neither the nuclear (ITS and 28S) nor the mitochondrial Aanen DK, Kuyper TW, Hoekstra RF (2001) A widely distributed ITS
loci fully resolved the different rust taxa under study. Despite polymorphism within a biological species of the ectomycorrhizal
fungus Hebeloma velutipes. Mycological Research, 105, 284–290.
this, ITS and 28S provided greater taxonomic resolution
Aime CM, Matheny BP, Henk DA et al. (2006) An overview of the
than the mitochondrial genes (Table 4). Although nad6 and
higher level classification of Pucciniomycotina based on com-
CO3 provided the same taxonomic resolution as ITS and bined analyses of nuclear large and small subunit rDNA sequences.
28S loci in the Chrysomyxa dataset (90% of the species Mycologia, 98, 896–905.
resolved; Table 4), these mitochondrial loci resulted in Ballard JWO, Whitlock MC (2004) The incomplete natural history
lower taxonomic resolution than the ribosomal loci in the of mitochondria. Molecular Ecology, 13, 729–744.
Melampsora data set (from 20 to 60%). Bateman A, Coin L, Durbin R et al. (2004) The Pfam protein families
database. Nucleic Acids Research, 32, D138–D141.
Our work demonstrates that the sequences currently
Bridge PD, Spooner BM, Roberts PJ (2005) The impact of molecular
available in public databases are useful to conduct in silico
data in fungal systematics. Advances in Botanical Research, 42, 34–67.
molecular studies for a large taxonomic group such as Basid- Brudno M, Malde S, Poliakov A et al. (2003) Glocal alignment: find-
iomycota. We initially postulated that such in silico analyses ing rearrangements during alignment. Bioinformatics 19, i54–i62.
could constitute a helpful resource for facilitating the choice Bruns TD (2001) ITS reality. Inoculum, 52, 2.
of genes with sufficient degree of divergence at the appro- Bruns TD, White TJ, Taylor TJ (1991) Fungal molecular systematics.
priate taxonomic scale. This approach allowed us to anti- Annual Review of Ecology and Systematics, 22, 525–564.
Bullerwell CE, Lang FB (2005) Fungal evolution: the case of the vanish-
cipate difficulties for in vivo PCR amplification of mitochondrial
ing mitochondrion. Current Opinion in Microbiology, 8, 362–369.
genes in this group of fungi. We predicted that numerous
Bullerwell CE, Leigh J, Forget L, Lang FB (2003) A comparison of
sporadic introns should occur in mitochondrial genes across three fission yeast mitochondrial genomes. Nucleic Acids Research,
several genera in the Basidiomycota and could compromise 31, 759–768.
the usefulness of these genes for DNA barcoding. Further- Burger G, Gray MW, Lang FB (2003) Mitochondrial genomes:
more, we demonstrated that several fungal mitochondrial anything goes. Trends in Genetics, 19, 709–716.
genes, including CO1 that had been proposed for DNA Chillali M, Idder-Ighili H, Guillaumin JJ et al. (1998) Variation in the
ITS and IGS regions of ribosomal DNA among the biological species
barcoding, exhibit a range of substantial interspecific diver-
of European Armillaria. Mycological Research, 102, 533–540.
gence which constitutes one of the fundamental require-
Collins TM, Wimberger PH, Naylor GJP (1994) Compositional
ments for a species-level DNA identification system. Despite bias, character-state bias, and character-state reconstruction using
such potential for high divergence levels, the taxonomical parsimony. Systematic Biology, 43, 482–496.
resolution observed in mitochondrial genes varies depend- Coprinus Cinereus Sequencing Project. Broad Institute of MIT and
ing on the combination locus/group of taxa considered. Harvard http://www.broad.mit.edu.
Finally, our comparison of four of these genes, CO1, atp6, Crane PE (2001) Morphology, taxonomy, and nomenclature of the
Chrysomyxa ledi complex and related rust fungi on spruce and
CO3 and nad6, with nuclear ribosomal regions (ITS and
Ericaceae in North America and Europe. Canadian Journal of
28S) in two rust data sets (including closely related
Botany, 79, 957–982.
species), revealed that ITS and 28S offer a better taxonomic Cryptococcus neoformans Genome Project, B3501A assembly data,
resolution than the mitochondrial loci in spite of the lower Stanford Genome Technology Center, funded by the NIAID/NIH
potential we initially observed for the 28S locus in our in under cooperative agreement AI47087, and The Institute for
silico analyses. Genomic Research, funded by the NIAID/NIH under cooperative
agreement U01 AI48594. http://www-sequence.stanford.edu/
group/C.neoformans/.
Acknowledgements Cryptococcus Neoformans Serotype B Sequencing Project. Broad
Institute of MIT and Harvard. http://www.broad.mit.edu.
The authors acknowledge David L. Joly for help with bioinformatics Cummings MP, Otto SP, Wakeley J (1995) Sampling properties of
and Franck Orsupetru Stefani and Philippe Tanguay for comments DNA sequence data in phylogenetic analysis. Molecular Biology
on the manuscript. This work was supported by the Natural Sciences and Evolution, 12, 814–822.
and Engineering Research Council of Canada (NSERC) and Genome Cywinska A, Hunter FF, Hebert PDN (2006) Identifying Canadian
Canada for funding the Canadian Barcode of Life Network and the mosquito species through DNA barcodes. Medical and Veterinary
Fungal DNA Barcoding Initiative. Entomology, 20, 413–424.
Dalgaard JZ, Klar AJ, Moser MJ et al. (1997) Statistical modeling
and analysis of the LAGLIDADG family of site-specific endo-
Conflict of interest statement
nucleases and identification of an intein that encodes a site-specific
The authors have no conflict of interest to declare and note that endonuclease of the HNH family. Nucleic Acids Research, 25,
the funders of this research had no role in study design, data 4626–4638.
Eddy SR (1998) Profile hidden Markov models. Bioinformatics, 14, Kirk PM, Cannon PF, David JC, Stalpers JA (2001) Dictionary of the
755–763. Fungi, 9th edn. CABI Publishing, Wallingford, UK.
Foster PG, Hickey DA (1999) Compositional bias may affect both Klich MA, Mullaney EJ (1992) Molecular methods for identifica-
DNA-based and protein-based phylogenetic reconstructions. tion and taxonomy of filamentous fungi. In: Handbook of Applied
Journal of Molecular Evolution, 48, 284–290. Mycology (ed. Arora DK), pp. 35–57. Banara Hindu University,
Gardes M, Bruns TD (1993) ITS primers with enhanced specificity Varanasi, India.
for basidiomycetes-application to the identification of mycor- Kouvelis VN, Ghikas DV, Typas MA (2004) The analysis of the com-
rhizae and rusts. Molecular Ecology, 2, 113–118. plete mitochondrial genome of Lecanicillium muscarium (synonym
Grasso V, Sierotzki H, Gisi U (2006) Relatedness among agro- Verticillium lecanii) suggests a minimum common gene organiza-
nomically important rusts based on mitochondrial cytochrome tion in mtDNAs of Sordariomycetes: phylogenetic implications.
b gene and ribosomal ITS sequences. Journal of Phytopathology, Fungal Genetics and Biology, 41, 930–940.
154, 110–118. Kretzer AM, Bruns TD (1999) Use of atp6 in fungal phylogenetics:
Guarro J, Gene J, Stchigel AM (1999) Developments in fungal an example from the Boletales. Molecular Phylogenetics and
taxonomy. Clinical Microbiology Reviews, 12, 454–500. Evolution, 13, 483–492.
Hajibabaei M, deWaard JR, Ivanova N et al. (2005) Critical factors for Lang FB, Laforest M-J, Burger G (2007) Mitochondrial introns: a
assembling a high volume of DNA barcodes. Philosophical Trans- critical view. Trends in Genetics, 23, 119–125.
actions of the Royal Society B: Biological Sciences, 360, 1959–1967. Lim YW, Sturrock R, Leal I et al. (2008) Distinguishing homo-
Hall TA (1999) BioEdit: a user-friendly biological sequence align- karyons and heterokaryons in Phellinus sulphurascens using
ment editor and analysis program for Windows 95/98/NT. pairing tests and ITS polymorphisms. Antonie Van Leeuwenhoek
Nucleic Acids Symposium Series 41, 95–98. International Journal of General and Molecular Microbiology, 93,
Heath PJ, Stephens KM, Monnat RJ, Stoddard BL (1997) The struc- 99–110.
ture of I-CreI, a Group I intron-encoded homing endonuclease. Lin JJ, Garber RC, Yoder OC (1988) Nucleotide sequence of a fungal
Nature Structural Biology, 4, 468–476. plasmid-like DNA containing the mitochondrial ATPase subunit
Hebert PDN, Cywinska A, Ball SL, deWaard JR (2003) Biological 6 gene. Nucleic Acids Research, 16, 9875.
identifications through DNA barcodes. Philosophical Transactions Lockhart PJ, Steel MA, Hendy MD, Penny D (1994) Recovering
of the Royal Society B: Biological Sciences, 270, 313–321. evolutionary trees under a more realistic model of sequence
Hebert PDN, Gregory TR (2005) The promise for DNA barcoding evolution. Molecular Biology and Evolution, 11, 605–612.
for taxonomy. Systematic Biology, 54, 852–859. Lowe TM, Eddy SR (1997) tRNAscan-SE: a program for improved
Hebert PDN, Penton EH, Burns JM, Janzen DH, Hallwachs W detection of transfer RNA genes in genomic sequence. Nucleic
(2004a) Ten species in one: DNA barcoding reveals cryptic species Acids Research, 25, 955–964.
in the neotropical skipper butterfly Astraptes fulgetor. Proceedings Lutzoni F, Kauff F, Cox JC et al. (2004) Assembling the fungal tree
of the National Academy of Sciences, USA, 101, 14812–14817. of life: progress, classification, and evolution of subcellular traits.
Hebert PDN, Stoeckle MY, Zemlak TS, Francis CM (2004b) Identi- American Journal of Botany, 91, 1446–1480.
fication of birds through DNA barcodes. Public Library of Science, Martin F, Aerts A, Ahren D et al. (2008) The genome of Laccaria
Biology, 2, 1657–1663. bicolor provides insights into mycorrhizal symbiosis. Nature,
Hibbet DS, Binder M, Bischoff JF et al. (2007) A higher-level 452, 88–92.
phylogenetic classification of the Fungi. Mycological Research, Martin FN, Bensasson D, Tyler BM (2007) Mitochondrial genome
111, 509–547. sequences and comparative genomics of Phytophtora ramorum
Hofstetter V, Miadlikowska J, Gueidan C et al. (2004) What do and P. sojae. Current Genetics, 51, 285–296.
protein-coding genes (ATP6, EF1-alpha and RNA polymerase II) Matheny PB, Wang Z, Binder M et al. (2007) Contributions of
bring to molecular systematics of lichens?. In: Book of Abstracts of rpb2 and tef1 to the phylogeny of mushrooms and allies
the 5th IAL Symposium: Lichens in Focus (ed. Press TU), p. 15. (Basidiomycota, Fungi). Molecular Phylogenetics and Evolution,
Tartu University Press, Tartu, Estonia. 43, 430–451.
Huang X, Madan A (1999) CAP3: a DNA sequence assembly pro- Mayor C, Brudno M, Schwartz JR et al. (2000) VISTA: vizualizing
gram. Genome Research, 9, 868–877. global DNA sequence alignments of arbitrary length. Bioinfor-
Huang X, Yang S-P, Chinwalla AT et al. (2006) Application of a matics Applications Note, 16, 1046–1047.
superword array in genome assembly. Nucleic Acids Research, Min XJ, Hickey DA (2007) Assessing the effect of varying sequence
34, 201–205. length on DNA barcoding of fungi. Molecular Ecology Notes, 7,
Ihako R, Gentleman R (1996) r: a language for data analysis and 365–373.
graphics. Journal of Computational and Graphical Statistics, 5, 299– Mohr G, Perlman PS, Lambowitz AM (1993) Evolutionary
314. relationships among groupII intron-encoded proteins and
IRGSP (2005) The map-based sequence of the rice genome. Nature, identification of a conserved domain that may be related to
436, 793–800. maturase function. Nucleic Acids Research, 21, 4991–4997.
James T, Kauff F, Schoch CL et al. (2006) Reconstructing the early evo- Moriyama EN, Powell JR (1997) Synonymous substitution rates
lution of Fungi using a six-gene phylogeny. Nature, 443, 818–822. in Drosophila: mitochondrial versus nuclear genes. Journal of
Katsu M, Kidd S, Ando A et al. (2004) The internal transcribed Molecular Evolution, 45, 378–391.
spacers and 5.8S rRNA gene show extensive diversity among Mourrier T, Hansen AJ, Willerslev E, Arctander P (2001) The
isolates of the Cryptococcus neoformans species complex. FEMS human genome project reveals a continuous transfer of large
Yeast Research, 4, 377–388. mitochondrial fragments to the nucleus. Molecular Biology and
Kimura M (1980) A simple method for estimating evolutionary Evolution, 18, 1833–1837.
rates of base substitutions through comparative studies of nucle- O’Donnell K, Cigelnik E (1997) Two divergent intragenomic rDNA
otide sequences. Journal of Molecular Evolution, 16, 111–120. ITS2 types within a monophyletic lineage of the fungus Fusarium
are nonorthologous. Molecular Phylogenetics and Evolution, 7, Thompson JD, Higgins DG, Gibson TJ (1994) ClustalW: improving
103–116. the sensitivity of progressive multiple sequence alignment through
Okabe I, Matsumoto N (2003) Phylogenetic relationship of Sclero- sequence weighting, position-specific gap penalties and weight
tium rolfsil (teleomorph Athelia rolfsii) and S.delphinii based on matrix choice. Nucleic Acids Research, 22, 4673–4680.
ITS sequences. Mycological Research, 107, 164–168. Tian G-L, Michel F, Macadre C, Slonimski PP, Lazowska J (1991)
Pamilo P, Viljakainen L, Vihavainen A (2007) Exceptionally high Incipient mitochondrial evolution in yeasts II. Journal of Molecular
density of numts in the honeybee genome. Molecular Biology and Biology, 218, 747–760.
Evolution, 24, 1340–1346. Tian C-M, Shang Y-Z, Zhuang J-Y, Wang Q, Kakishima M (2004)
Pantou M, Kouvelis V, Typas M (2006) The complete mitochon- Morphological and molecular phylogenetic analysis of Melampsora
drial genome of the vascular wilt fungus Verticillium dahliae: a species on poplars in China. Mycoscience, 45, 56–66.
novel gene order for Verticillium and a diagnostic tool for species Toor N, Zimmerly S (2002) Identification of a family of group II
identification. Current Genetics, 50, 125–136. introns encoding LAGLIDADG ORFs typical of group I introns.
Paquin B, Laforest M-J, Forget L et al. (1997) The fungal mitochon- RNA, 8, 1373–1377.
drial genome project: evolution of fungal mitochondrial genomes Vaughn JC, Mason MT, Sper-Whitis GL, Kulman P, Palmer JD
and their gene expression. Current Genetics, 31, 380–395. (1995) Fungal origin by horizontal transfer of a plant mitochon-
Paquin B, O’Kelly CJ, Lang FB (1995) Intron-encoded open reading drial group I intron in the chimeric Cox1 gene of Peperomia. Journal
frame of the GIY-YIG subclass in a plastid gene. Current Genetics, of Molecular Evolution, 41, 563–572.
28, 97–99. Vilgalys R, Hester M (1990) Rapid genetic identification and mapping
Pei MH, Shang YZ (2005) A brief summary of Melampsora species of enzymatically amplified ribosomal DNA from several Crypto-
on Populus. In: Rust Disease of Willow and Poplar (eds Pei MH, coccus species. Journal of Bacteriology, 172, 4238–4246.
McCracken AR), pp. 51–61. CABI Publishing, Wallingford, UK. Waugh J (2007) DNA barcoding in animal species: progress, poten-
Posada D, Crandall KA (1998) ModelTest: testing the model of tial and pitfalls. Bioessays, 29, 188–197.
DNA substitution. Bioinformatics, 14, 817–818. White TJ, Bruns TS, Lee S, Taylor JW (1990) Amplification and
Puccinia Graminis Sequencing Project. Broad Institute of MIT and direct sequencing of fungal ribosomal RNA genes for phylo-
Harvard. http://www.broad.mit.edu. genetics. In: PCR Protocols: A Guide to Methods and Applications
Ricchetti M, Fairhead C, Dujon B (1999) Mitochondrial DNA repairs (eds Innis MA, Gelfand DH, Sninsky JJ, White TJ), pp. 315–
double-stand breaks in yeast chromosomes. Nature, 402, 96–100. 322. Academic Press, New York.
Richly E, Leister D (2004) NUMTs in sequenced eukaryotic Xu J, Singh RS (2005) The inheritance of organelle genes and
genomes. Molecular Biology and Evolution, 21, 1081–1084. genomes: patterns and mechanisms. Genome, 48, 951–958.
Roe AD, Sperling FAH (2007) Patterns of evolution of mitochondrial Yan Z, Xu J (2005) Fungal mitochondrial inheritance and evolu-
cytochrome c oxidase I and II DNA and implication for DNA tion. In: Evolutionary Genetics of Fungi (ed. Xu J), pp. 221–252.
barcoding. Molecular Phylogenetics and Evolution, 44, 325–345. Horizon Scientific Press, Wymondham, UK.
Rokas A, Nylander JAA, Ronquist F, Stone GN (2002) A maximum- Zeng JS, De Hoog GS (2008) Exophiala spinifera and its allies: dia-
likelihood analysis of eight phylogenetic markers in gallwasps gnostics from morphology to DNA barcoding. Medical Mycology,
(Hymenoptera: Cynipidae): implications for insect phylogenetic 46, 193–208.
studies. Molecular Phylogenetics and Evolution, 22, 206–209.
Seifert KA, Samson RA, DeWaard JR et al. (2007) Prospects for
fungus identification using CO1 DNA barcodes, with Penicillium Supporting information
as a test case. Proceedings of the National Academy of Sciences of the
Additional supporting information may be found in the online
USA, 104, 3901–3906.
version of this article:
Skouboe P, Frisvad JC, Taylor JW et al. (1999) Phylogenetic analysis
of nucleotide sequences from the ITS region of terverticillate Fig. S1 Nucleotidic alignments obtained for 14 mitochondrial genes
Penicillium species. Mycological Research, 103, 873–881. encoding subunits of the respiratory chain complexes. The sequences
Smith M, Douhan G, Rizzo D (2007) Intra-specific and intra- were obtained from available Basidiomycete genomic resources as
sporocarp ITS variation of ectomycorrhizal fungi as assessed by detailed in the Material and methods section.
rDNA sequencing of sporocarps and pooled ectomycorrhizal
Fig. S2 Dendrograms constructed with the neighbour-joining
roots from a Quercus woodland. Mycorrhiza, 18, 15–22.
algorithm based on the K2P distance matrices of the ITS, 28S, CO1,
Song H, Buhay JE, Whiting MF, Crandall KA (2008) Many species
atp6, CO3 and nad6 nucleotidic-sequence alignment for the Mela-
in one: DNA barcoding overestimates the number of species
mpsora and the Chrysomyxa data sets.
when nuclear mitochondrial pseudogenes are coamplified. Pro-
ceedings of the National Academy of Sciences, USA, 105, 13486–13491. Fig. S3 G + C content comparisons between the nuclear and mito-
Stoll M, Piepenbring M, Begerow D, Oberwinkler F (2003) Molecular chondrial contigs considered in this study. (A) Plot of the G + C
phylogeny of Ustilago and Sporisorium species (Basidiomycota, content of each contig (nuclear and mitochondrial) considered in
Ustilaginales) based on internal transcribed spacer (ITS) sequences. the genome assemblies; (B) Box plot of G + C content in mitochon-
Canadian Journal of Botany, 81, 976–984. drial and nuclear contigs considered in the genome assemblies.
Stupar RM, Lilly JW, Town CD et al. (2001) Complex mtDNA con-
Table S1 Information about Chrysomyxa and Melampsora speci-
stitutes an approximate 620-kb insertion on Arabidopsis thaliana
mens used in this study
chromosome 2: implication of potential sequencing errors caused
by large-unit repeats. Proceedings of the National Academy of Please note: Wiley-Blackwell are not responsible for the content or
Sciences, USA, 98, 5099–5103. functionality of any supporting materials supplied by the authors.
Swofford DJ (2003) PAUP 4.0 User's manual: Phylogenetic Analysis Using Any queries (other than missing material) should be directed to
Parsimony. Sinauer Associates Inc., Sunderland, Massachusetts. the corresponding author for the article.