You are on page 1of 15

Molecular Ecology Resources (2009) 9 (Suppl. 1), 99–113 doi: 10.1111/j.1755-0998.2009.02637.

BARCODING FUNGI
Blackwell Publishing Ltd

Evaluation of mitochondrial genes as DNA barcode for


Basidiomycota
A G AT H E V I A L L E ,1*† N I C O L A S F E A U ,1† M AT H I E U A L L A I R E ,† M A RY N A D I D U K H ,‡
F R A N C I S M A RT I N ,¶ J E A N - M A R C M O N C A LV O § and R I C H A R D C . H A M E L I N *†
*Centre d'étude de la forêt, Université Laval, QC, Canada G1 K 7P4, †Laurentian Forestry Centre, Canadian Forest Service, Natural
Resources Canada, 1055 du PEPS, PO Box 10380, Stn Sainte-Foy, QC, Canada G1V 4C7, ‡Department of Ecology and Evolutionary
Biology, University of Toronto, ON, Canada M5S 3B2, §Department of Natural History, Royal Ontario Museum, and Department of
Ecology and Evolutionary Biology, University of Toronto, ON, Canada M5S 2C6, ¶Unité Mixte de Recherche INRA/UHP 1136
‘Interactions Arbres/Microorganismes’, Institut National de la Recherche Agronomique, Centre de Recherches de Nancy, 54280
Champenoux, France

Abstract
Our study evaluated in silico the potential of 14 mitochondrial genes encoding the subunits
of the respiratory chain complexes, including cytochrome c oxidase I (CO1), as Basidiomycota
DNA barcode. Fifteen complete and partial mitochondrial genomes were recovered and char-
acterized in this study. Mitochondrial genes showed high values of molecular divergence,
indicating a potential for the resolution of lower-level relationships. However, numerous
introns occurred in CO1 as well as in six other genes, potentially interfering with polymerase
chain reaction amplification. Considering these results and given the minimal length of 600-bp
that is optimal for a fungal barcode, the genes encoding for the ATPase subunit 6, the cyto-
chrome oxidase subunit 3 and the NADH dehydrogenase subunit 6 have the most promising
characteristics for DNA barcoding among the mitochondrial genes studied. However, biolog-
ical validation on two fungal data sets indicated that no single mitochondrial gene gave a
better taxonomic resolution than the ITS, the region already widely used in fungal taxonomy.
Keywords: Basidiomycota, CO1, DNA barcoding, ITS, mitochondrial genes

Received 31 October 2008; revision received 26 January 2009; accepted 30 January 2009

DNA barcodes are short and standardized sequences of


Introduction
nucleotides from a genomic region universally present in
Basidiomycota is one of the major fungal phyla with more target lineages and exhibiting sufficient sequence variation
than 30 000 species described and encompasses a broad range to distinguish species (Hebert et al. 2003; Waugh 2007). DNA
of taxa, morphologies, ecologies and life-history strategies barcoding promises fast, economic and easy identification
(Kirk et al. 2001). Many recent molecular phylogenetic studies of species (Hebert et al. 2003; Hajibabaei et al. 2005). The
using nuclear genes have significantly improved our under- selection of a barcode locus is a compromise between the
standing of higher taxonomic level evolutionary relationships possibility to design universal DNA primers for polymerase
in this phylum (Lutzoni et al. 2004; James et al. 2006; Hibbet chain reaction (PCR) amplification and the need for maximal
et al. 2007). However, species delimitation and identification molecular evolution rates for sufficient discrimination
remain problematic in many groups, particularly among between closely related taxa. Fungal molecular systematics
morphologically similar taxa, for example in the Puccinio- has relied heavily on analysis of the nuclear ribosomal RNA
mycotina and Ustilaginomycotina (Stoll et al. 2003; Bridge (rRNA) cluster that comprises the small (18S) and large (28S)
et al. 2005; Aime et al. 2006). ribosomal subunits (Bridge et al. 2005). In general terms,
sequence divergence levels in the 18S and 28S usually allow
Correspondence: Richard C. Hamelin, Fax: 001 418648 5849; differentiation of higher taxonomic levels such as families
E-mail: Richard.hamelin@ubc.ca and genera while polymorphisms in the internal transcribed
1
Equal contributors spacer regions (ITS) generally differentiate between species

© 2009 Blackwell Publishing Ltd and Crown in the right of Canada


100 B A R C O D I N G F U N G I

(Bruns et al. 1991; Bridge et al. 2005). The ITS can easily be gene regions, alone or concatenated, for fungal taxonomy
amplified from many fungal taxa using a limited set of and systematics (Paquin et al. 1997; Grasso et al. 2006; Seifert
primers (White et al. 1990; Gardes & Bruns 1993). Further- et al. 2007). The CO1 gene was suitable as a barcode for dis-
more, based on the finding that differences among species criminating fungal species in Penicillium, a taxonomically
are generally consistently larger than those within species, challenging genus within ascomycetous fungi (Min & Hickey
ITS sequences have been frequently utilized for fungal taxa 2007; Seifert et al. 2007). Nevertheless, despite this specific
delimitation and identification (Bruns et al. 1991; Guarro et al. case study, the use of CO1-5′ as a DNA barcoding region for
1999; Bruns 2001; Katsu et al. 2004). This level of polymorph- delimiting closely related fungal species has not been exten-
ism makes the ITS a logical candidate for DNA barcoding of sively studied. Some characteristics of the fungal mito-
fungi (Zeng & De Hoog 2008). chondrial genome represent significant obstacles for the
However, utilization of this region for taxonomic and phylo- elaboration of a barcode. Copies of mtDNA genes (or gene
genetic purposes in fungi can present some limitations. regions) can occur in the nuclear genome [the so-called
On the one hand, a lack of variation among closely related nuclear mitochondrial DNA (NUMT)] (Burger et al. 2003;
species has been observed even when the species’ borders Richly & Leister 2004; Song et al. 2008) and in the mitochon-
were well defined and supported by other molecular, mor- drial genome (Martin et al. 2007). Group I or II introns are
phological and/or biological data. For example, molecular present in the mitochondrial genes with variable fre-
taxonomic studies have demonstrated the inefficiency of the quencies depending on the species sampled, with a strong
ITS to resolve well-characterized species in Heterobasidion, preference for protein-coding genes (Paquin et al. 1997;
Armillaria, Fusarium and Penicillium genera (Chillali et al. Burger et al. 2003; Yan & Xu 2005). Furthermore, functional
1998; Skouboe et al. 1999; Bruns 2001; Seifert et al. 2007). constraints, unequal substitution patterns and hetero-
Furthermore, in some cases, the ITS regions could not reveal geneous evolutionary rates found in mtDNA genes can
known cryptic species in fungal species complexes (Tian potentially affect its usefulness for species delimitation
et al. 2004; Seifert et al. 2007); the resolution of these taxo- (Paquin et al. 1997; Roe & Sperling 2007).
nomically challenging and economically important groups In this study, we present a bioinformatics approach that
should be one of the most important attributes of a fungal aims to evaluate the potential of 14 mitochondrial genes
DNA barcode. On the other hand, intraspecific and even commonly found in fungal mitochondrial genomes as DNA
intra-individual variations have been found in some fungal barcodes for the Basidiomycota. These genes encode hydro-
groups for these loci (O’Donnell & Cigelnik 1997; Aanen phobic subunits of the respiratory chain complexes I, III and
et al. 2001; Okabe & Matsumoto 2003; Lim et al. 2008). Such IV [including apocytochrome b (cob); cytochrome oxidase
polymorphisms in the ITS can either result from differences subunits 1, 2 and 3 (CO1 to CO3); NADH dehydrogenase
within nuclei (heterogeneity among repeats) or from differ- subunits 1, 2, 3, 4, 4L, 5 and 6 (nad1 to nad6, nad4L), and
ences between nuclei (dikaryotic and multinucleate fungi) ATPase subunits 6, 8 and 9 (atp6, atp8 and atp9)]. Our strat-
(Aanen et al. 2001; Okabe & Matsumoto 2003). Heterogeneity egy first consisted in carrying out an inventory of the pub-
among repeats creates complications for direct sequencing licly available mitochondrial sequence data for this group
of ITS- PCR products because of the occurrence of multiple of fungi to recover a maximum of complete mitochondrial
different ITS copies in a single fungal isolate (Aanen et al. genomes. We evaluated the structure of each coding gene
2001; Matheny et al. 2007) and could makes it difficult to to assess (i) intron position and size; (ii) the occurrence of
accurately define taxonomic groups at the species level partial or full length copies of the gene in both nuclear and
(Aanen et al. 2001; Smith et al. 2007). mtDNA; and (iii) the information provided in the context
Mitochondrial DNA (mtDNA) has a simple genetic struc- of barcoding. We then assessed and compared the poten-
ture, a limited exposure to genetic recombination, and rapid tial for species level taxonomic resolution with the resolu-
rates of evolution compared with nuclear DNA (Xu & Singh tion obtained for the ITS and 28S loci by sequencing 38
2005; Waugh 2007) and is thus well suited to design mole- strains within the Chrysomyxa and Melampsora genera
cular markers for the study of closely related taxa (Hebert et al. which represent taxonomically challenging groups in
2003). A 648-bp region at the 5′-end of the mitochondrial Pucciniomycotina.
gene encoding cytochrome c oxidase I (CO1-5′) has been
proposed as the core of global bio-identification systems for
Materials and methods
eukaryotes (Hebert et al. 2003; Hebert & Gregory 2005;
Waugh 2007). CO1-5′ appears to possess a greater potential
Recovery of mtDNA genome sequences and protein coding
as species-specific marker than any other mitochondrial gene
genes for Basidiomycota
as confirmed by taxonomic resolution (> 95%) in most
Metazoa groups, including Lepidoptera and birds (Hebert Three strategies were designed to recover the entire mito-
et al. 2003, 2004a, b; Cywinska et al. 2006). Promising results chondrial genome of 17 Basidiomycota species from various
have recently been published using mitochondrial coding public databases (Fig. 1). In the first strategy (Fig. 1-1), six

© 2009 Blackwell Publishing Ltd and Crown in the right of Canada


B A R C O D I N G F U N G I 101

Fig. 1 Bioinformatic process showing three


strategies designed to recover fungal mitoch-
ondrial genomes. Grey boxes correspond to
available data at the origin of each strategy.
Plain line boxes present the tools (query
genes, blast algorithm, program) used to
recover mitochondrial genomes in the
different strategies. Dotted line boxes show
possible output of the different tool boxes
during the mitochondrial genome assembly.

complete and annotated mtDNA genomes [Cryptococcus query. The third strategy (Fig. 1-3) consisted in the construc-
neoformans var. grubii (strain H99), Moniliophthora perniciosa, tion of a mitochondrial genome assembly for seven other
Pleurotus ostreatus, Schizophyllum commune, Tilletia indica and species (Laccaria bicolor (Martin et al. 2008), Phanerochaete
Ustilago maydis] were obtained from the National Center chrysosporium, Phakopsora pachyrhizi, Postia placenta, Sporidio-
for Biotechnology Information (NCBI) Genomes databases bolus salmonicolor, Sporobolomyces roseus and Ustilago hordei)
[http://www.ncbi.nlm.nih.gov/genomes/ORGANELLES/ using available whole-genome shotgun sequences (WGS)
fu.html], and two complete but non-annotated genomes recovered from Trace Archive databases at the NCBI website
[C. neoformans var. neoformans (strain JEC21) and Puccinia (http://www.ncbi.nlm.nih.gov/Traces/trace.cgi?). For the
graminis f. sp. tritici] were obtained from the Stanford Genome latter fungi, all traces were assembled using the Pcap.Rep
Technology Center and the Broad Institute websites, respec- program (Huang et al. 2006) with default parameters. Each
tively. The second strategy (Fig. 1-2) was aimed at recovering whole-genome assembly obtained was screened for potential
sequence contigs that correspond to the mitochondrial gen- mitochondrial contig(s) using the method described above
ome of Coprinus cinereus and Cryptococcus neoformans var. (strategy 2). If more than one single contig was retrieved, a
gattii (strain R265) within assembled contigs from whole new assembly was performed with the Cap3 program imple-
genomes. The available expressed sequence tags (ESTs) for mented with default parameters (Huang & Madan 1999).
C. cinereus (15 715 ESTs) were screened (tblastx, E-value In order to improve the comprehensiveness of this last mito-
threshold at 1e20) for the presence of mitochondrial ESTs chondrial genome assembly, an additional blastn search
using a set of mitochondrial protein coding genes as well as (E-value threshold at 1e20) was performed against the Trace
rRNA and tRNA of C. neoformans var. grubii (strain H99), Archive database using the 14 genes encoding subunits of
M. perniciosa, S. commune and U. maydis as queries. The the respiratory chain complexes of each complete and
identification of the corresponding mitochondrial contig annotated mtDNA genomes [C. neoformans (strain H99),
was performed by searching the C. cinereus whole-genome M. perniciosa, P. ostreatus, S. commune and U. maydis] as queries.
sequence using the tblastx program (E-value threshold All similar sequences retrieved by this search were appended
at 1e20) and submitting each EST recovered in the last to the previous assembly with one additional run of Cap3
search. Likewise, the corresponding mitochondrial contig of contig assembly (Fig. 1, step 4).
C. neoformans var. gattii (strain R265) was identified using In addition, 24 mitochondrial-protein encoding sequences
C. neoformans var. grubii (strain H99) mitochondrial genes as (complete CDS features) for nine strains of eight other

© 2009 Blackwell Publishing Ltd and Crown in the right of Canada


102 B A R C O D I N G F U N G I

Basidiomycota species [Agaricus bitorquis, Agrocybe aegerita,


Pattern of evolution of 14 mitochondrial genes
Agrocybe chaxingu, Cryptococcus neoformans var. neoformans
(strain IFM5844), Cryptococcus neoformans var. grubii (strain The 14 genes encoding subunits of the respiratory chain
IFO410), Rhodotorula glutinis, Suillus luteus, Suillus sinus- complexes were initially considered. For comparison, align-
paulianus and Trimorphomyces papilionaceus] were directly ments of 18S (1699 nucleotides) and 28S (593 nucleotides
downloaded from the NCBI website. All the sequence data located at the 5′-end of the gene) sequences obtained from
considered in this study are detailed in Table 1. The the public databases for the same Basidiomycota taxa were
sequence alignments obtained for the 14 mitochondrial included (Table 1). Orthologous sequences were confirmed
genes encoding subunits of the respiratory chain com- with the Aspergillus niger (strain N909) mitochondrion com-
plexes are available in Fig. S1, Supporting information. plete genome sequence (DQ207726) to serve as outgroup.
Inserts were excluded and short gaps were coded as missing
data. For each gene alignment, Kimura 2-parameter distances
Characterization of 14 genes encoding the subunits of the
(K2P; Kimura 1980) between sequences were computed
respiratory chain complexes
pairwise using paup version 4.0b10 for Unix (Swofford 2003).
Fourteen genes encoding subunits of the respiratory chain Statistical significance of differences between distance distri-
complexes that are typically present in the mitochondrion butions obtained for the 14 genes was determined by one-
of Basidiomycota were analysed. Each mitochondrial gene way anova followed by a Tukey test using the r statistical
was individually edited in BioEdit version 7.0.0 (Hall 1999) package (Ihako & Gentleman 1996).
and characterized as follows. First, the protein-coding, The best-fit maximum-likelihood (ML) model of sequence
intronic and intergenic regions were localized based on evolution was identified for each of the 14 gene alignments
multiple comparisons with both the available annotations using the likelihood ratio test implemented in the Model-
of the S. commune (NC003049) and U. maydis (NC008368) Test version 3.7 program for Unix (Posada & Crandall 1998).
mitochondrial genomes. These multiple comparisons were The parameters allowed to vary in model fitting were base
performed using the Shuffle-LAGAN program (Brudno composition, substitution rates [variation in transition/
et al. 2003) accessible on the mVISTA website (http:// transversion (ti/tv) ratio] and rate of heterogeneity across
genome.lbl.gov/vista/mvista/submit.shtml) (Mayor et al. sites (by both the invariable-sites model and the gamma-
2000). Each intronic region localized was characterized by distributed rates model).
specific protein domain identification and intron secondary
structure modelling. The conserved sequence domains
Efficiency of the mitochondrial DNA barcode candidates
found in group-I intron endonucleases [e.g. LAGLIDADG
1 and 2 (Dalgaard et al. 1997; Heath et al. 1997) and GIY-YIG To investigate the efficiency of the mitochondrial DNA
(Tian et al. 1991; Paquin et al. 1995)] and group-II intron barcode candidates, we concentrated on the resolution of
maturases [domain X (Mohr et al. 1993)] were identified by species of the Melampsora and Chrysomyxa genera that
HMM searches (Hmmer version 2.32; E-value ≤ 1e-02) against cause rust diseases on plants. Specimens were obtained
the Pfam database (Eddy 1998; Bateman et al. 2004). Then, from fresh collections and from three national Canadian
the secondary structures of groups I and II introns were herbaria. We considered 15 Melampsora specimens representing
predicted using the ERPIN search algorithm implemented five Melampsora species (one to five specimens per species)
in the RNAweasel prediction tool (http://megasun.bch. collected on aspen and white poplars, and 23 specimens of
umontreal.ca/RNAweasel/) (Lang et al. 2007). Second, tRNA Chrysomyxa collected on spruce and Ericaceae (10 species,
contents were predicted with the tRNAscan-SE version 1.21 1 to 3 specimens per species). Species identification was
program (http://lowelab.ucsc.edu/tRNAscan-SE/) using based on morphological traits and specificity to the plant
the default search mode and mitochondrial models as source host.
(Lowe & Eddy 1997). Total genomic DNA was extracted using a modified
protocol of the DNeasy Plant Mini kit (QIAGEN). For each
specimen, a single sorus (uredinium or aecium) was excised
Presence of mtDNA copies in nuclear and mitochondrial
from the infected host tissues. Infected plant material was
genomes
then incubated for 2 h at 55 °C in 500 µL of lysis buffer with
mtDNA copies longer than 20 nt were inferred from blastn 10 µL of Proteinase K before using the manufacturer’s protocol.
hits (E-value < 0.01). For each fungus, each of the 14 genes Full length ITS sequences (~620 bp) were amplified using
encoding subunits of the respiratory chain complexes (com- primers ITS1F (Gardes & Bruns 1993) and ITS4BR (5′-
plete gene sequence including both exonic and intronic TCAACAGACTTGTACATGGTCC-3′) for Melampsora. A
regions) was used as the query sequence to search against 730-bp ITS sequence of the Chrysomyxa specimen was ampli-
its own mitochondrial genome and (when available) its own fied using two primer pairs, ITS1F (Gardes & Bruns 1993)/
complete genome assembly. ITS2R2 (5′-GACACTCAAACAGGTGTACCTT-3′) and

© 2009 Blackwell Publishing Ltd and Crown in the right of Canada


© 2009 Blackwell Publishing Ltd and Crown in the right of Canada

Table 1 Available data for the Basidiomycota considered in this study

Publicly available data

Whole genome Complete Mitochondrial Trace


sequences genome genome archive Source and/or GenBank Accession no.
Taxa shotgun assembly assembly file 28S 18S (in parentheses)

Available genome sequences


Coprinus cinereus X X X Broad Institute/(AF041494)/(M92991)
Cryptococcus neoformans X X (not annotated) X X NCBI/Stanford Genome Technology Center (SGTC)
var. neoformans (strain JEC21)
Cryptococcus neoformans X X X X X Broad Institute/(NC004336)/(AJ560308)
var. grubii (strain H99)
Cryptococcus neoformans X X X X Broad Institute
var. gattii (strain R265)
Laccaria bicolor X X X X Join Genome Institute (JGI)/NCBI (WGS)/(DQ071702)
Moniliophthora perniciosa X X X (NC005927)/(AY916742)/(AY916739)
Phakopsora pachyrhizi X X X NCBI (WGS)/(DQ354537)
Phanerochaete chrysosporium X X JGI/NCBI (WGS)
Pleurotus ostreatus X X X (EF204913)/(DQ071722)/(AY657015)
Postia placenta X X (incomplete) X X X JGI/NCBI (WGS)/(AF139970)
Puccinia graminis X X X (not annotated) X X Broad Institute/(DQ417387)/(AY125409)
Schizophyllum commune X X X (NC003049)/(DQ071725)/(X54865)
Sporidiobolus salmonicolor X NCBI (WGS)
Sporobolomyces roseus X X JGI/NCBI (WGS)
Tilletia indica X X (DQ993184)/(AY818977)
Ustilago hordei X NCBI (WGS)
Ustilago maydis X X X X Broad Institute/(NC008368)/(AF453938)/(X62396)

Additional corresponding mitochondrial gene sequences


Agaricus bitorquis atp6 (U60235)
Agrocybe aegerita CO1 (AF010257)
cob (AY781064)
Agrocybe chaxingu cob (AY772389)
Cryptococcus neoformans var. atp6 (AY560609)
neoformans (strain IFM 5844) atp9 (AY560609)
CO1 (AY560609)
CO2 (AY138989)
nad6 (AF533432)
Cryptococcus neoformans var. atp6 (AY560610)
grubii (strain IFO410) atp9 (AY560610)
CO1 (AY560610)

B A R C O D I N G F U N G I 103
CO2 (AF534130)
cob (AY560608)
nad4 L (AF538353)
Rhodotorula glutinis atp8 (AB248915)
atp9 (AB248915)
CO3 (AB248915)
nad5 (AB248915)
nad6 (AB248915)
Suillus luteus atp6 (AF002135)
Suillus sinuspaulianus CO3 (AF002136)
Trimorphomyces papilionaceus cob (X85236)
nad1 (X73821)
104 B A R C O D I N G F U N G I

ITS3R2 (5′-AAGGTACACCTGTTTGAGTGTC-3′)/ITS4BR fluorescence following 1.5% agarose gel eletrophoresis in


(5′-TCAACAGACTTGTACATGGTCC-3′). A 740-bp of 28S 1× TAE buffer and ethidium bromide staining. Sequencing
sequence was amplified using the ITS4-BRf (5′-GGACCA was then performed in both directions with the appropriate
TGTACAAGTCTGTTGA-3′) and LR5 primers (Vilgalys amplification primers using the Big Dye Terminator Cycle
& Hester 1990) for both Melampsora and Chrysomyxa. PCRs Sequencing Kit version 1.1 on an ABI 3730xl sequencer
were carried out in a 25-µL reaction volume consisting of (Applied Biosystems) at the CHUL Research Centre
1 µl of undiluted DNA template, 0.35 µm of each primer, (CRCHUL) Sequencing and Genotyping Platform, Quebec
0.2 mm of each dNTPs (GE Healthcare), 1.6 mm MgCl2, City, QC, Canada.
1 µg/µL of BSA, and 1 U of Platinum Taq DNA polymerase Sequences were manually edited using BioEdit version
(Invitrogen) in a 1× Taq DNA polymerase buffer (20 mm 7.0.9 (Hall 1999) to remove ambiguous base calls and primer
Tris-HCl, pH 8, 50 mm KCl), with thermocycling conditions sequences, and were aligned using the ClustalW software
as follows: denaturation for 3 min at 94 °C, 35 cycles at (Thompson et al. 1994). For each data set obtained, we used
94 °C for 30 s, 30 s at 50 °C, and 70 s at 72 °C with a final the methods previously described to determine K2P intra-
extension of 10 min at 72 °C. and interspecific genetic distances between pairwise sequ-
Primer pairs were designed for CO1-5′, CO3, atp6 and ences among each data set using paup. Dendrograms were
nad6. For Melampsora spp., two primer pairs, Cox1MlpAF constructed using the neighbour-joining algorithm based
(5′-TAAGATGACTTTATAGTACCAA-3′)/Cox1MlpAR (5′- on the K2P-distance matrices, as described in Hebert et al.
GCTCCTACCATTACMG-3′) and Cox1MlpB2F (5′-CTGCT (2003) and Seifert et al. (2007). A barcode species is defined
ATGCCCAAGTCTAA-3′)/Cox1MlpCR (5′-ATGTGATGAC if all of its constituent sequences formed a monophyletic
TTCAAACCAC-3′), were used for the amplification of two cluster corresponding with morphological and host affinity
composite non-overlapping regions located at the 5′-end of traits. Specimen information, including sequence GenBank
the CO1 gene. One primer pair, Nad6MLP1F (5′-ATGAAT Accession numbers, are presented in Table S1, Supporting
TGAGCTCTAAATACCATCT-3′) and Nad6MLP1R (5′-TTG information, and are available in the Barcode of Life Data-
TCACTTGTCATTACAATAGG-3′), was used for the ampli- base (http://www.barcodinglife.org). Dendrograms obtained
fication of 500 bp of the nad6 gene. Around 660 bp of the for this study are presented in Fig. S2, Supporting information.
CO3 gene was amplified with CO3_F1(5′-TCAGTATGTT
ATTTTAACGATGTAG-3′) and CO3_R1(5′-TCCTCATCAG
Results and discussion
TAAACACTAATA-3′), and 650 bp of the atp6 gene was
amplified with the atp6_F1(5′-TAGAGCAATTTGAAGTTC
Characterization of mitochondrial genomes of
AGAATCT-3′)/atp6_R1(5′-GATGAATGATACTGCGATC
Basidiomycota
TCT-3′) primer pair.
For the Chrysomyxa specimens, Cox1MlpAF/Cox1MlpAR A total of 15 mitochondrial genomes were retrieved (Table 2),
and Cox1MlpB2F/Cox1MlpCR primers pairs have been tested among which seven [Coprinus cinereus, Cryptococcus neofor-
but only a short fragment of the CO1 gene was amplified mans var. gattii (strain R265), Laccaria bicolor, Phanerochaete
with the C30 (5′-GCARTTCTRTATTTTGTATTTGG-3′)/ chrysosporium, Phakopsora pachyrhizi, Postia placenta and Sporo-
C346CR (5′-CGCWCCTACTAYTASHGG-3′) primer pair. A bolomyces roseus] were recovered using the reconstruction
450-bp fragment from the mitochondrial nad6 was amplified strategies 2, 3 and 4 of our bioinformatic process (Fig. 1).
using the 23 N6f (5′-CCAGTAACGTCWGTAGTRTATC-3′) However, we were unable to construct comprehensive mito-
and 504 N6 r (5′-GCAGRAATACRAAAGAGGC-3′) primer chondrial contigs from the traces available for Sporidiobolus
pair. The CO3_F1/CO3_R1 and the atp6_F1/atp6_R1 primer salmonicolor (24 879 traces) and Ustilago hordei (1607 traces).
pairs were used to amplify 660 bp of the CO3 gene and These mitochondrial sequences were consequently excluded
650 bp of the atp6 gene respectively. from the assembly by the Pcap.Rep program.
All PCRs of mitochondrial genes were carried out in a Specific features common to mtDNA were observed in
20-µL reaction volume consisting of 1 µL of undiluted DNA the mitochondrial genomes (Klich & Mullaney 1992; Yan &
template, 0.2 µm of each primer, 0.15 mm of each dNTPs Xu 2005). First, the overall G + C content found in the mito-
(GE Healthcare), 1.5 mm MgCl2, and 1 U of Platinum Taq chondrial contigs ranged between 21.9% and 37.78%, with
DNA polymerase (Invitrogen) in a 1× Taq DNA polymerase a mean of 31.08%, whereas the G + C content in nuclear
buffer (20 mm Tris-HCl, pH 8, 50 mm KCl), with conditions contigs oscillated around 50% (see Fig. S3, Supporting infor-
as follows: denaturation for 3 min at 95 °C, 36 cycles at mation). Second, mitochondrial genome size ranged from
95 °C for 45 s, 47 °C for 30 s with primer pair Cox1MlpAF/ 24.8 kb for C. neoformans var. grubii (strain H99) to more than
Cox1MlpAR, Nad6MLP1F/Nad6MLP1R, CO3_F1/CO3_R, 100 kb for Moniliophthora perniciosa and P. placenta (Table 2).
atp6_F1/atp6_R1 and C30/C346 or 50 °C for 30 s with Such length differences can be explained by the high vari-
Cox1MlpB2F/Cox1MlpCR and 23 N6f/504 N6 r, and 72 °C ation in gene and intron numbers in mitochondrial genomes.
for 70 s. PCR products were visualized by ultraviolet For example, the ribosomal protein of the small ribosomal

© 2009 Blackwell Publishing Ltd and Crown in the right of Canada


B A R C O D I N G F U N G I 105

Table 2 General features of mitochondrial genomes for the Basidiomycota considered in this study

Other features* Insert

No. of No of
Final putative putative
assembly No. of G+C No. of group-I group-II
Species size (nt) contigs content % rns§ rnl§ tRNA Rps3§ Total intron intron

C. cinereus‡ 42 448 1 28.30 √ √ 24 √ 1 1 0


C. neoformans var. neoformans 33 199 1 34.80 √ √ 21 √ 9 9 0
(strain JEC21)
C. neoformans var. grubii 24 874 1 34.98 √ √ 21 — 2 2 0
(strain H99)
C. neoformans var. gattii 34 790 1 33.86 √ √ 21 √ 7 7 0
(strain R265)‡
L. bicolor‡ 95 303 1 28.32 √ √ 25 √ 7 7 0
M. perniciosa 109 103 1 31.89 √ √ 28 √ 13 13 0
P. pachyrhizi‡† 35 070 1 34.72 √ √ 24 — 5 5 0
P. chrysosporium‡† 40 093 12 28.76 √ — 3 — ... ... ...
P. ostreatus 73 242 1 26.35 √ √ 26 √ 9 9 0
P. placenta‡† 114 196 4 27.34 √ √ 23 — 17 14 1
P. graminis 79 448 1 37.17 √ √ 22 √ 13 12 1
S. commune 49 704 1 21.86 √ √ 27 √ 0 0 0
T. indica 65 147 1 28.86 √ √ 24 — 9 7 0
S. roseus‡† 45 385 26 37.78 √ √ 19 — 1 1 0
U. maydis 56 814 1 31.20 √ √ 23 √ 10 10 0

*in addition to the 14 genes encoding the hydrophobic subunits of the respiratory chain complexes common to fungal mitochondrial
genomes CO1–2, cob, nad1–6, atp6–9
†incomplete mitochondrial genome sequence, some mitochondrial protein-coding genes not retrieved
‡genome recovered using the reconstruction strategies 2, 3 and 4 of the bioinformatic process described in this study (Fig. 1)
(...) missing data
§target gene (√) retrieved or (—) not retrieved

subunit (rps3) was identified in only 9 out of the 15 mito- contigs, for a total size of 114.1, 45.4 and 40.1 kb, respec-
chondrial genomes screened. The set of mitochondrial tRNAs tively (Table 2).
ranged from 3 tRNA genes for P. chrysosporium to 28 in the
M. perniciosa mitochondrial genome. All 14 genes encoding
Characterization of 14 genes encoding the subunits of the
the hydrophobic subunits of the respiratory chain complexes
respiratory chain complexes
could be localized in all studied genomes, except atp8 in
P. chrysosporium, CO2 in P. placenta and nad6 in P. pachyrhizi Half of the 14 genes encoding the subunits of the respiratory
and S. roseus. Similarly, genes encoding the 23S and 16S chain complexes (CO1 and CO2, cob, nad1, nad2, nad4 and
ribosomal RNAs of the large and small subunits of the ribo- nad5) contained insert locations, ranging from one (in nad2)
some (rnl and rns, respectively) were present in all the mtDNA to 18 (in CO1), for a total of 42 (Fig. 2). One hundred and
genomes investigated, except rnl absent in P. chrysosporium. twelve of these inserts were identified as probable group I
Such differences in gene patterns among different mito- introns, among which 97 contained putative open reading
chondrial genomes reflected either the absence of one specific frames (ORFs) encoding LAGLIDADG or GIY-YIG endonu-
gene in the mitochondrial genome, as previously reported clease proteins. Two group II introns were also found in this
for the atp9 gene in Podospora anserina (Yan & Xu 2005), or the data set. The insert retrieved in the nad4 gene of P. graminis
limitation of the data-mining and reconstruction methods contained one ORF which encodes a group II intron maturase.
applied to incomplete Trace Archive data sets. The mitochon- Likewise, based on secondary structure prediction, the 11th
drial genomes of P. placenta (CO2 and rps3 missing), S. roseus CO1 insert found in P. placenta was identified as a group-II
(nad6 and rps3 missing, atp9 incomplete) and P. chrysosporium intron. Interestingly, this last putative group-II intron poss-
(atp8, rnl and rps3 missing, CO1, CO2, CO3, cob, nad1 and esses an ORF which encodes a LAGLIDADG endonuclease
nad5 incomplete) are most likely incomplete since they identical to those found in group I introns. Such a family of
resulted from the concatenation of 4, 26 and 12 non-overlapping group-II introns, e.g. encoding LAGLIDADG ORFs typical

© 2009 Blackwell Publishing Ltd and Crown in the right of Canada


106 B A R C O D I N G F U N G I
© 2009 Blackwell Publishing Ltd and Crown in the right of Canada

Fig. 2 Number, position and characterization of inserts in seven mitochondrial genes encoding subunits of the respiratory chain complexes.
B A R C O D I N G F U N G I 107

Table 3 Properties of the 14 genes encoding subunits of the respiratory chain complexes genes

No. of No. of Maximal


fungal taxon No. of Length of insert(s) + No. of significant
strain with insert(s) continuous exons length copy(ies) [length No. of complete
Gene considered insert(s) location(s) exons (species) variation(nt)] copy(ies) (species)

atp6 18 0 0– 718–865 Mit: 6 (20–760) 1 (S. roseus)


CGA: 5 (32–724) 1 (P. graminis)
atp8 15 0 0 144–62 Mit: 4 (20–162)
CGA: 4 (49–146) 1 (P. graminis)
1 (C. neoformans H99)
atp9 16 0 0 205–222 Mit: 3 (23–57)
CGA: 2 (43–125)
CO1 17 11 18 11–1583 16 446 (P. placenta) Mit: 222* (20–3910)
CGA: 55 (28–1579) 1 (C. neoformans H99)
CO2 15 3 3 68–759 2800 (M. perniciosa) Mit: 162*(20–708) 3 (S. roseus)
CGA: 8 (25–1481) 1 (P. graminis)
CO3 16 0 0 796–839 Mit: 1 (47)
CGA: 10 (28–104)
cob 18 17 9 10–1192 9158 (P. placenta) Mit: 279* (20–920)
CGA: 32 (28–2157) 1 (C. neoformans H99)
nad1 15 6 4 110–997 2106 (C. neoformans H99) Mit: 156* (20–133)
CGA: 9 (23–1509) 1 (C. neoformans H99)
nad2 15 1 1 711–1962 3087 (P. placenta) Mit: 29 (20–1283)
CGA: 14 (30–1502) 1 (C. neoformans H99)
nad3 15 0 0 326–375 Mit: 11 (20–375) 1 (P. graminis) 2 (S. roseus)
CGA: 2 (42–82)
nad4L 17 0 0 262–270 Mit: 6 (21–263) 2 (S. roseus)
CGA: 6 (30–183)
nad4 15 2 2 173–1467 3778 (P. graminis) Mit: 8 (23–1110)
CGA: 12 (29–173)
nad5 12 5 4 169–2064 5449 (P. placenta) Mit: 98*(20–1114)
CGA: 31 (21–1028)
nad6 15 0 0 579–757 885 (T. indica) Mit: 7 (21–24)
CGA: 4 (40–622) 1 (C. neoformans H99)

*Multiple copies of a short DNA fragment; CGA complete genome assembly.

of group-I introns has been observed in the rnl and rns genes were localized following our fungal genome searches
of Agrocybe aegerita and Trimorphomyces papilionaceus, respec- (Table 3). Size of these copies ranged from 20 nt to the full
tively (Toor & Zimmerly 2002). Patterns of presence/absence length of the query sequence. Two classes of copies were
for these ORFs resulted in significant length variations for distinguishable. The first class comprised entire (mostly
the different inserts found in those genes (from 200 bp for perfect) copies that matched almost 90% of the query sequ-
the S. roseus insert in cob to 4635 bp for the third C. neoformans ence length and exhibited up to 95% identity (blastn E-
(strain R265) insert in CO1). value of 0.0). Members of this category were rare and confined
As observed in other fungal mtDNA studies (Vaughn et al. to three species: C. neoformans (strain H99) (one complete
1995; Paquin et al. 1997), we confirmed the prevalence of copy of atp8, CO1, cob, nad1, nad2 and nad6), P. graminis (one
introns in the CO1 and cob genes. We furthermore report on complete copy of atp6, atp8, CO2 and nad3) and S. roseus (one
the wide distribution of large putative introns in other genes complete copy of atp6, three of CO2 and two of nad3 and
such as nad1, nad5 and CO2 (Fig. 2). This contrasts with an nad4L; Table 3). The six complete copies found in C. neofor-
earlier report that CO2, along with atp9 and nad6, rarely mans (strain H99) were distributed across four supercontigs
contained introns in fungal mtDNA (Paquin et al. 1997). (supercontig 1.124 for CO1 and atp8, 1.80 for nad1 and cob,
1.97 for nad2 and 1.76 for nad6). Strong similarities (99%)
were observed between these supercontigs and homologous
Analysis of mtDNA copies of the 14 mitochondrial genes
regions in the mitochondrial genome of C. neoformans strain
Over 1180 putative copies of the 14 genes encoding subunits H99. Similarly, all the copies found in P. graminis, except
of the respiratory chain complexes considered in this study nad3, were clustered in one single supercontig (2.15), but in

© 2009 Blackwell Publishing Ltd and Crown in the right of Canada


108 B A R C O D I N G F U N G I

this case, only a small part of this supercontig (0.2% of the (± 0.23), and by percentage of variable sites (62.6% for atp9
total supercontig length) showed a strong homology (98%) and 77.4% for nad3; Fig. 3a). The substantial divergence
with the P. graminis mitochondrial genome. Given the strong values observed at higher taxonomic levels (e.g. between
identities observed in these fungi between these supercontigs genera within Basidiomycota) suggest that these 14 mito-
and homologous regions in their respective mitochondrial chondrial genes may be potentially useful markers for the
genomes, two hypotheses can be advocated. First, the pre- resolution of lower-level relationships (e.g. between species).
sence of copies identical to mtDNA sequences in nuclear Such high divergence levels were expected for mitochondrial
DNA can be attributed to recent DNA duplications and protein-coding genes (Moriyama & Powell 1997; Hebert
transfers from mitochondria to chromosomes (NUMTs). et al. 2003). As a comparison, at similar taxonomic levels
Such invasion of nuclear DNA by mtDNA appears as a (between genera within Basidiomycota), overall means of
continuous evolutionary process that results in the occurrence K2P distances observed for the ribosomal nuclear RNA
of complete rearranged and/or fragmented copies of variable genes (18S and 28S) were 0.12 (± 0.05) and 0.25 (± 0.05),
in sizes and evenly distributed within and among chromo- respectively. In contrast, numerous insertion/deletion in
somes (Richly & Leister 2004; Bullerwell & Lang 2005; Pamilo the ITS sequences obtained for the same taxa prevented
et al. 2007). Examples of recent transfers have been observed accurate nucleotidic alignment and determination of the
in the ascomycete yeast Saccharomyces cerevisiae (Ricchetti molecular evolution pattern.
et al. 1999) and in other nuclear genomes from eukaryotes, We used likelihood-ratio tests to find the best evolution-
such as plants and human (Mourrier et al. 2001; Stupar et al. ary model fitting each of the genes under examination. For
2001; IRGSP 2005). Second, mis- or unassembled parts of the each gene, evolutionary models that accounted for unequal
mitochondrial genome sequence can be retrieved in the com- base frequencies provided a significantly better fit to the
plete genome assembly. These technical problems might have data. Base compositional bias was lower for atp9 and CO1
arisen during the assembly of a whole genome sequence, (observed G + C content of 40% and 37%, respectively) rela-
either from insufficient read quality or from the presence of tive to the 12 other mitochondrial genes examined, which
nucleotide repeats. In fact, alignment errors (computational exhibited a mean G + C content of 30% (± 3%). At this level,
or clone-induced from chimera) resulting in genome region such A-T bias may result in an increase of homoplasy
misassemblies and producing these copies cannot be (Lockhart et al. 1994; Foster & Hickey 1999; Rokas et al. 2002).
completely excluded. Similar computational errors should Limitations in the type of changes at several nucleotidic sites
have been generated in the S. roseus mitochondrial genome induced by homoplasy could result in asymmetrical patterns
assembly since perfect copies of the entire CO2, nad3 and of among-base substitution rates (Collins et al. 1994). Only
nad4L genes were retrieved in this genome. the 18S, 28S and atp8 genes fitted relatively simple models
The second class of copies included short repetitive ele- assuming two [Tamura–Nei (TrN) model for the 18S and
ments scattered throughout mitochondrial genomes. The 28S genes] or three substitution rates [transitional model
mitochondrial genomes of C. cinereus, L. bicolor, M. perniciosa (TIM) for atp8]. The 13 remaining genes fitted transversional
and P. placenta contained such elements that originated from (TVM) and general time reversible (GTR) models, which com-
the CO1 and 2, cob and nad1 and nad5 genes (Table 3). Most prise five and six classes of substitution types, respectively
of these sequences (six in P. placenta, 34 in M. perniciosa and (Fig. 3b). The complexity of these models was furthermore
62 in L. bicolor) of 20–92 nt were repeated from 15 to more emphasized by a parameter accounting for the rate of
than 30 times and exhibited short hairpin structures. Such heterogeneity across sites. In the five cases in which no
secondary structures have previously been reported in mito- invariable site parameter (I) was added to the ML model
chondrial fungal genomes and are described as double- (atp8, apt9, nad3, nad4L and 28S; Fig. 3b), the shape parameter
hairpin elements (DHE) (Paquin et al. 1997). (alpha) of the gamma-distributed rates component was low,
denoting a strong rate of heterogeneity among sites (e.g. a
more uneven distribution of rates among sites). We noted
Patterns of evolution
a significant correlation between the alpha and I parameters
To better understand the pattern and rate of evolution of (r2 = 0.74; P < 0.05), which likely resulted from the fact that
the mitochondrial loci, we assessed sequence divergence more sites were allocated to the invariant site category, the
levels between 14 ingroup taxa (C. cinereus, C. neoformans remaining sites showed a lesser rate of heterogeneity. Low
(strain JEC21), C. neoformans (strain H99), C. neoformans (strain alpha values correspond to genes with a few sites evolving
R265), L. bicolor, M. perniciosa, P. pachyrhizi, P. ostreatus, P. at a very high rate, with the remaining sites changing at a
placenta, P. graminis, S. commune, S. roseus, T. indica, and U. very slow rate. Thus, given these model parameter tend-
maydis) for each of the 14 mitochondrial genes encoding the encies (e.g. base composition heterogeneity, substitution bias
subunits of the respiratory chain complexes. We found high and low alpha values), the substantial variation observed
interspecific mutational variation in each gene as measured in these genes appears to be concentrated at a few sites.
by K2P, with distances ranging from 0.36 (± 0.11) to 0.68 These sites are likely to have multiple substitutions with a

© 2009 Blackwell Publishing Ltd and Crown in the right of Canada


B A R C O D I N G F U N G I 109

Fig. 3 Attributes of 14 genes encoding subunits of the mitochondrial respiratory chain complexes. (a) Kimura-2-parameter (K2P)
interspecific distances between pairwise sequences with the standard deviation plotted against the percentage of variable sites found for
each gene set; lines below K2P values indicate that the means were not significantly differentiated by a Tukey test. (b) Gamma-shape
parameters, proportion of invariable sites, and among-bases substitution rates components of the best-fitting models for the 14 genes
considered. The classes of substitution types varied according to the colours depicted in the bottom right box. The upper right box contains
the lengths of the different data sets.

resulting reduction in the number of variable and/or Moncalvo, unpublished). However, the use of this gene in
alternative character states available. This translates into Ascomycota proved to be problematic and was abandoned
an increased sensitivity to homoplasy (Cummings et al. in the Assembling the Fungal Tree of Life (AFTOL) initiative
1995; Ballard & Whitlock 2004). Such mutational saturation (V. Hofstetter, personal communication to J.M.M.). Multiple
tendency raised the possibility that convergence in base divergent atp6 sequences were recovered from several lichen
composition between unrelated taxa could lead to incorrect strains. These divergent atp6 sequences were hypothesized
species delimitations (Roe & Sperling 2007). to originate from autonomously replicating plasmid-like
DNA containing the atp6 gene, as observed in the maize
pathogen Cochliobolus heterosporus (Lin et al. 1988; Hofstetter
Efficiency of the mitochondrial DNA barcode candidates
et al. 2004). The use of CO3 in Boletales was abandoned since
Considering the absence of intron in the in silico analysis it contained introns that interfered with PCR amplification
and the potential for high divergence levels, seven fungal (Kretzer & Bruns 1999). The NADH dehydrogenase subunit
mitochondrial genes, atp6, atp8, atp9, CO3, nad3, nad4L and genes are absent from the mitochondrion of several yeasts
nad6, had potential as DNA barcodes. With an optimal length (Ascomycota) (Bullerwell et al. 2003). In filamentous fungi,
for a barcode of approximately 600 nt (Min & Hickey 2007), few studies have considered nad6 as a tool for fungal system-
the set is reduced to three genes: atp6, CO3 and nad6. Atp6 atics and taxonomy. These studies emphasized the inade-
was successfully used for systematics in the Boletales quacy of individual mitochondrial genes to resolve species
(Agaricomycotina, Basidiomycota) (Kretzer & Bruns 1999) phylogenies (Kouvelis et al. 2004; Pantou et al. 2006). Mito-
and is promising for other groups of Agaricomycotina (J.M. chondrial DNA (especially the intergenic sequences of the

© 2009 Blackwell Publishing Ltd and Crown in the right of Canada


110 B A R C O D I N G F U N G I

Table 4 Efficiency of the ITS, 28S and four mitochondrial loci as DNA barcodes

DNA barcode amplification DNA barcode efficiency

No. of Total length K2P distance K2P distance


primer % amplification used for K2P intraspecific interspecific Species
Marker pair success distan(ce pb) comparisons (range) comparisons (range) resolution *%

Chrysomyxa data set (10 species/23 isolates)


28S 1 100 702 0.0001 0.04 90
(± 0.0003) (± 0.07)
ITS 2 100 859 0.001 0.08 90
(± 0.001) (± 0.1)
CO1 2 29 292 0.001 0.08 50
(± 0.000) (± 0.07)
nad6 1 100 451 0.000 0.06 90
(± 0.01) (± 0.04)
CO3 1 100 517 0.001 0.04 90
(± 0.001) (± 0.02)
atp6 1 91 583 0.001 0.04 70
(± 0.001) (± 0.03)
Melampsora data set (5 species/15 isolates)
28S 1 100 738 0 0.034 80
(± 0) (± 0.01)
ITS 1 100 629 0.0004 0.07 80
(± 0.0008) (± 0.03)
CO1 2 100 744 0.0006 0.003 60
(± 0.001) (± 0.002)
nad6 1 100 500 0 0.005 60
(± 0) (± 0.004)
CO3 1 100 665 0 0.003 20
(± 0) (± 0.003)
atp6 1 100 652 0 0.0006 20
(± 0) (± 0.0009)

*A species is considered resolved if all of its constituent sequences form a monophyletic cluster and are distinct from other sequences.

NADH dehydrogenase subunit genes) was nevertheless The number of primer pairs required for successful PCR
considered as a valuable tool for the discrimination of amplifications and sequencing varied according to the locus
closely related species within Ascomycota (Kouvelis et al. and the data set (either Melampsora spp. or Chrysomyxa spp.)
2004; Pantou et al. 2006). targeted (Table 4). In general, PCR amplification results
Following our in silico results, we tested the efficiency of obtained from these two rust genera were congruent with
four of the 14 mitochondrial genes for DNA barcode. First, the results obtained in the in silico analyses. The PCR amplifi-
we tried to amplify the CO1-5′ region by PCR since this locus cation of a ~600-bp product from the 5′-end of the CO1 gene
was initially proposed as the universal barcode system for generally required more than a single primer pair due to the
eukaryotes. Then, we compared the efficiency of the nad6, occurrence of numerous large introns. In contrast, no intronic
CO3 and atp6 genes with the 28S, ITS and CO1-5′ loci. To regions were found in the nad6 and CO3 genes among the
assess the potential of these loci as DNA barcodes, we 15 fungi considered in the preliminarily in silico analysis,
generated a data set for two fungal genera with taxonomic and successful PCR amplifications of these genes were
difficulties (Table S1). The species complex Chrysomyxa ledi obtained using only one single primer pair for the Melamp-
de Bary includes several cryptic species. At least six of them sora and Chrysomyxa strains considered here. The amplifi-
are distinguishable by their spore morphometry and/or cation and sequencing of the ITS, CO1 and atp6 loci in the
uredinial host specificity (Crane 2001). A collection of Chrysomyxa data set was particularly complicated. Two pri-
Melampsora species sampled on aspen and white poplars mer pairs were required to obtain readable ITS sequences.
was also included in this study. This data set includes the Even using multiple primer pairs in different amplification
M. populnea species complex composed of at least four species reactions, a maximum of 292 bp was obtained for the CO1
distinguished through aecial host specificity, but morpho- gene for 29% of the specimens tested (Table 4). We ampli-
logically similar (Pei & Shang 2005). fied more than 650 bp of the atp6 gene in the Chrysomyxa

© 2009 Blackwell Publishing Ltd and Crown in the right of Canada


B A R C O D I N G F U N G I 111

data set except for two species: C. ledi and C. rhododendri. This collection and analysis, decision to publish, or preparation of
failure might have occurred for these two species because the manuscript.
of (i) the occurrence of intron; (ii) the presence of poly-
morphisms at the primer sites, although these had been
References
designed in a conserved part of the atp6 gene.
Neither the nuclear (ITS and 28S) nor the mitochondrial Aanen DK, Kuyper TW, Hoekstra RF (2001) A widely distributed ITS
loci fully resolved the different rust taxa under study. Despite polymorphism within a biological species of the ectomycorrhizal
fungus Hebeloma velutipes. Mycological Research, 105, 284–290.
this, ITS and 28S provided greater taxonomic resolution
Aime CM, Matheny BP, Henk DA et al. (2006) An overview of the
than the mitochondrial genes (Table 4). Although nad6 and
higher level classification of Pucciniomycotina based on com-
CO3 provided the same taxonomic resolution as ITS and bined analyses of nuclear large and small subunit rDNA sequences.
28S loci in the Chrysomyxa dataset (90% of the species Mycologia, 98, 896–905.
resolved; Table 4), these mitochondrial loci resulted in Ballard JWO, Whitlock MC (2004) The incomplete natural history
lower taxonomic resolution than the ribosomal loci in the of mitochondria. Molecular Ecology, 13, 729–744.
Melampsora data set (from 20 to 60%). Bateman A, Coin L, Durbin R et al. (2004) The Pfam protein families
database. Nucleic Acids Research, 32, D138–D141.
Our work demonstrates that the sequences currently
Bridge PD, Spooner BM, Roberts PJ (2005) The impact of molecular
available in public databases are useful to conduct in silico
data in fungal systematics. Advances in Botanical Research, 42, 34–67.
molecular studies for a large taxonomic group such as Basid- Brudno M, Malde S, Poliakov A et al. (2003) Glocal alignment: find-
iomycota. We initially postulated that such in silico analyses ing rearrangements during alignment. Bioinformatics 19, i54–i62.
could constitute a helpful resource for facilitating the choice Bruns TD (2001) ITS reality. Inoculum, 52, 2.
of genes with sufficient degree of divergence at the appro- Bruns TD, White TJ, Taylor TJ (1991) Fungal molecular systematics.
priate taxonomic scale. This approach allowed us to anti- Annual Review of Ecology and Systematics, 22, 525–564.
Bullerwell CE, Lang FB (2005) Fungal evolution: the case of the vanish-
cipate difficulties for in vivo PCR amplification of mitochondrial
ing mitochondrion. Current Opinion in Microbiology, 8, 362–369.
genes in this group of fungi. We predicted that numerous
Bullerwell CE, Leigh J, Forget L, Lang FB (2003) A comparison of
sporadic introns should occur in mitochondrial genes across three fission yeast mitochondrial genomes. Nucleic Acids Research,
several genera in the Basidiomycota and could compromise 31, 759–768.
the usefulness of these genes for DNA barcoding. Further- Burger G, Gray MW, Lang FB (2003) Mitochondrial genomes:
more, we demonstrated that several fungal mitochondrial anything goes. Trends in Genetics, 19, 709–716.
genes, including CO1 that had been proposed for DNA Chillali M, Idder-Ighili H, Guillaumin JJ et al. (1998) Variation in the
ITS and IGS regions of ribosomal DNA among the biological species
barcoding, exhibit a range of substantial interspecific diver-
of European Armillaria. Mycological Research, 102, 533–540.
gence which constitutes one of the fundamental require-
Collins TM, Wimberger PH, Naylor GJP (1994) Compositional
ments for a species-level DNA identification system. Despite bias, character-state bias, and character-state reconstruction using
such potential for high divergence levels, the taxonomical parsimony. Systematic Biology, 43, 482–496.
resolution observed in mitochondrial genes varies depend- Coprinus Cinereus Sequencing Project. Broad Institute of MIT and
ing on the combination locus/group of taxa considered. Harvard http://www.broad.mit.edu.
Finally, our comparison of four of these genes, CO1, atp6, Crane PE (2001) Morphology, taxonomy, and nomenclature of the
Chrysomyxa ledi complex and related rust fungi on spruce and
CO3 and nad6, with nuclear ribosomal regions (ITS and
Ericaceae in North America and Europe. Canadian Journal of
28S) in two rust data sets (including closely related
Botany, 79, 957–982.
species), revealed that ITS and 28S offer a better taxonomic Cryptococcus neoformans Genome Project, B3501A assembly data,
resolution than the mitochondrial loci in spite of the lower Stanford Genome Technology Center, funded by the NIAID/NIH
potential we initially observed for the 28S locus in our in under cooperative agreement AI47087, and The Institute for
silico analyses. Genomic Research, funded by the NIAID/NIH under cooperative
agreement U01 AI48594. http://www-sequence.stanford.edu/
group/C.neoformans/.
Acknowledgements Cryptococcus Neoformans Serotype B Sequencing Project. Broad
Institute of MIT and Harvard. http://www.broad.mit.edu.
The authors acknowledge David L. Joly for help with bioinformatics Cummings MP, Otto SP, Wakeley J (1995) Sampling properties of
and Franck Orsupetru Stefani and Philippe Tanguay for comments DNA sequence data in phylogenetic analysis. Molecular Biology
on the manuscript. This work was supported by the Natural Sciences and Evolution, 12, 814–822.
and Engineering Research Council of Canada (NSERC) and Genome Cywinska A, Hunter FF, Hebert PDN (2006) Identifying Canadian
Canada for funding the Canadian Barcode of Life Network and the mosquito species through DNA barcodes. Medical and Veterinary
Fungal DNA Barcoding Initiative. Entomology, 20, 413–424.
Dalgaard JZ, Klar AJ, Moser MJ et al. (1997) Statistical modeling
and analysis of the LAGLIDADG family of site-specific endo-
Conflict of interest statement
nucleases and identification of an intein that encodes a site-specific
The authors have no conflict of interest to declare and note that endonuclease of the HNH family. Nucleic Acids Research, 25,
the funders of this research had no role in study design, data 4626–4638.

© 2009 Blackwell Publishing Ltd and Crown in the right of Canada


112 B A R C O D I N G F U N G I

Eddy SR (1998) Profile hidden Markov models. Bioinformatics, 14, Kirk PM, Cannon PF, David JC, Stalpers JA (2001) Dictionary of the
755–763. Fungi, 9th edn. CABI Publishing, Wallingford, UK.
Foster PG, Hickey DA (1999) Compositional bias may affect both Klich MA, Mullaney EJ (1992) Molecular methods for identifica-
DNA-based and protein-based phylogenetic reconstructions. tion and taxonomy of filamentous fungi. In: Handbook of Applied
Journal of Molecular Evolution, 48, 284–290. Mycology (ed. Arora DK), pp. 35–57. Banara Hindu University,
Gardes M, Bruns TD (1993) ITS primers with enhanced specificity Varanasi, India.
for basidiomycetes-application to the identification of mycor- Kouvelis VN, Ghikas DV, Typas MA (2004) The analysis of the com-
rhizae and rusts. Molecular Ecology, 2, 113–118. plete mitochondrial genome of Lecanicillium muscarium (synonym
Grasso V, Sierotzki H, Gisi U (2006) Relatedness among agro- Verticillium lecanii) suggests a minimum common gene organiza-
nomically important rusts based on mitochondrial cytochrome tion in mtDNAs of Sordariomycetes: phylogenetic implications.
b gene and ribosomal ITS sequences. Journal of Phytopathology, Fungal Genetics and Biology, 41, 930–940.
154, 110–118. Kretzer AM, Bruns TD (1999) Use of atp6 in fungal phylogenetics:
Guarro J, Gene J, Stchigel AM (1999) Developments in fungal an example from the Boletales. Molecular Phylogenetics and
taxonomy. Clinical Microbiology Reviews, 12, 454–500. Evolution, 13, 483–492.
Hajibabaei M, deWaard JR, Ivanova N et al. (2005) Critical factors for Lang FB, Laforest M-J, Burger G (2007) Mitochondrial introns: a
assembling a high volume of DNA barcodes. Philosophical Trans- critical view. Trends in Genetics, 23, 119–125.
actions of the Royal Society B: Biological Sciences, 360, 1959–1967. Lim YW, Sturrock R, Leal I et al. (2008) Distinguishing homo-
Hall TA (1999) BioEdit: a user-friendly biological sequence align- karyons and heterokaryons in Phellinus sulphurascens using
ment editor and analysis program for Windows 95/98/NT. pairing tests and ITS polymorphisms. Antonie Van Leeuwenhoek
Nucleic Acids Symposium Series 41, 95–98. International Journal of General and Molecular Microbiology, 93,
Heath PJ, Stephens KM, Monnat RJ, Stoddard BL (1997) The struc- 99–110.
ture of I-CreI, a Group I intron-encoded homing endonuclease. Lin JJ, Garber RC, Yoder OC (1988) Nucleotide sequence of a fungal
Nature Structural Biology, 4, 468–476. plasmid-like DNA containing the mitochondrial ATPase subunit
Hebert PDN, Cywinska A, Ball SL, deWaard JR (2003) Biological 6 gene. Nucleic Acids Research, 16, 9875.
identifications through DNA barcodes. Philosophical Transactions Lockhart PJ, Steel MA, Hendy MD, Penny D (1994) Recovering
of the Royal Society B: Biological Sciences, 270, 313–321. evolutionary trees under a more realistic model of sequence
Hebert PDN, Gregory TR (2005) The promise for DNA barcoding evolution. Molecular Biology and Evolution, 11, 605–612.
for taxonomy. Systematic Biology, 54, 852–859. Lowe TM, Eddy SR (1997) tRNAscan-SE: a program for improved
Hebert PDN, Penton EH, Burns JM, Janzen DH, Hallwachs W detection of transfer RNA genes in genomic sequence. Nucleic
(2004a) Ten species in one: DNA barcoding reveals cryptic species Acids Research, 25, 955–964.
in the neotropical skipper butterfly Astraptes fulgetor. Proceedings Lutzoni F, Kauff F, Cox JC et al. (2004) Assembling the fungal tree
of the National Academy of Sciences, USA, 101, 14812–14817. of life: progress, classification, and evolution of subcellular traits.
Hebert PDN, Stoeckle MY, Zemlak TS, Francis CM (2004b) Identi- American Journal of Botany, 91, 1446–1480.
fication of birds through DNA barcodes. Public Library of Science, Martin F, Aerts A, Ahren D et al. (2008) The genome of Laccaria
Biology, 2, 1657–1663. bicolor provides insights into mycorrhizal symbiosis. Nature,
Hibbet DS, Binder M, Bischoff JF et al. (2007) A higher-level 452, 88–92.
phylogenetic classification of the Fungi. Mycological Research, Martin FN, Bensasson D, Tyler BM (2007) Mitochondrial genome
111, 509–547. sequences and comparative genomics of Phytophtora ramorum
Hofstetter V, Miadlikowska J, Gueidan C et al. (2004) What do and P. sojae. Current Genetics, 51, 285–296.
protein-coding genes (ATP6, EF1-alpha and RNA polymerase II) Matheny PB, Wang Z, Binder M et al. (2007) Contributions of
bring to molecular systematics of lichens?. In: Book of Abstracts of rpb2 and tef1 to the phylogeny of mushrooms and allies
the 5th IAL Symposium: Lichens in Focus (ed. Press TU), p. 15. (Basidiomycota, Fungi). Molecular Phylogenetics and Evolution,
Tartu University Press, Tartu, Estonia. 43, 430–451.
Huang X, Madan A (1999) CAP3: a DNA sequence assembly pro- Mayor C, Brudno M, Schwartz JR et al. (2000) VISTA: vizualizing
gram. Genome Research, 9, 868–877. global DNA sequence alignments of arbitrary length. Bioinfor-
Huang X, Yang S-P, Chinwalla AT et al. (2006) Application of a matics Applications Note, 16, 1046–1047.
superword array in genome assembly. Nucleic Acids Research, Min XJ, Hickey DA (2007) Assessing the effect of varying sequence
34, 201–205. length on DNA barcoding of fungi. Molecular Ecology Notes, 7,
Ihako R, Gentleman R (1996) r: a language for data analysis and 365–373.
graphics. Journal of Computational and Graphical Statistics, 5, 299– Mohr G, Perlman PS, Lambowitz AM (1993) Evolutionary
314. relationships among groupII intron-encoded proteins and
IRGSP (2005) The map-based sequence of the rice genome. Nature, identification of a conserved domain that may be related to
436, 793–800. maturase function. Nucleic Acids Research, 21, 4991–4997.
James T, Kauff F, Schoch CL et al. (2006) Reconstructing the early evo- Moriyama EN, Powell JR (1997) Synonymous substitution rates
lution of Fungi using a six-gene phylogeny. Nature, 443, 818–822. in Drosophila: mitochondrial versus nuclear genes. Journal of
Katsu M, Kidd S, Ando A et al. (2004) The internal transcribed Molecular Evolution, 45, 378–391.
spacers and 5.8S rRNA gene show extensive diversity among Mourrier T, Hansen AJ, Willerslev E, Arctander P (2001) The
isolates of the Cryptococcus neoformans species complex. FEMS human genome project reveals a continuous transfer of large
Yeast Research, 4, 377–388. mitochondrial fragments to the nucleus. Molecular Biology and
Kimura M (1980) A simple method for estimating evolutionary Evolution, 18, 1833–1837.
rates of base substitutions through comparative studies of nucle- O’Donnell K, Cigelnik E (1997) Two divergent intragenomic rDNA
otide sequences. Journal of Molecular Evolution, 16, 111–120. ITS2 types within a monophyletic lineage of the fungus Fusarium

© 2009 Blackwell Publishing Ltd and Crown in the right of Canada


B A R C O D I N G F U N G I 113

are nonorthologous. Molecular Phylogenetics and Evolution, 7, Thompson JD, Higgins DG, Gibson TJ (1994) ClustalW: improving
103–116. the sensitivity of progressive multiple sequence alignment through
Okabe I, Matsumoto N (2003) Phylogenetic relationship of Sclero- sequence weighting, position-specific gap penalties and weight
tium rolfsil (teleomorph Athelia rolfsii) and S.delphinii based on matrix choice. Nucleic Acids Research, 22, 4673–4680.
ITS sequences. Mycological Research, 107, 164–168. Tian G-L, Michel F, Macadre C, Slonimski PP, Lazowska J (1991)
Pamilo P, Viljakainen L, Vihavainen A (2007) Exceptionally high Incipient mitochondrial evolution in yeasts II. Journal of Molecular
density of numts in the honeybee genome. Molecular Biology and Biology, 218, 747–760.
Evolution, 24, 1340–1346. Tian C-M, Shang Y-Z, Zhuang J-Y, Wang Q, Kakishima M (2004)
Pantou M, Kouvelis V, Typas M (2006) The complete mitochon- Morphological and molecular phylogenetic analysis of Melampsora
drial genome of the vascular wilt fungus Verticillium dahliae: a species on poplars in China. Mycoscience, 45, 56–66.
novel gene order for Verticillium and a diagnostic tool for species Toor N, Zimmerly S (2002) Identification of a family of group II
identification. Current Genetics, 50, 125–136. introns encoding LAGLIDADG ORFs typical of group I introns.
Paquin B, Laforest M-J, Forget L et al. (1997) The fungal mitochon- RNA, 8, 1373–1377.
drial genome project: evolution of fungal mitochondrial genomes Vaughn JC, Mason MT, Sper-Whitis GL, Kulman P, Palmer JD
and their gene expression. Current Genetics, 31, 380–395. (1995) Fungal origin by horizontal transfer of a plant mitochon-
Paquin B, O’Kelly CJ, Lang FB (1995) Intron-encoded open reading drial group I intron in the chimeric Cox1 gene of Peperomia. Journal
frame of the GIY-YIG subclass in a plastid gene. Current Genetics, of Molecular Evolution, 41, 563–572.
28, 97–99. Vilgalys R, Hester M (1990) Rapid genetic identification and mapping
Pei MH, Shang YZ (2005) A brief summary of Melampsora species of enzymatically amplified ribosomal DNA from several Crypto-
on Populus. In: Rust Disease of Willow and Poplar (eds Pei MH, coccus species. Journal of Bacteriology, 172, 4238–4246.
McCracken AR), pp. 51–61. CABI Publishing, Wallingford, UK. Waugh J (2007) DNA barcoding in animal species: progress, poten-
Posada D, Crandall KA (1998) ModelTest: testing the model of tial and pitfalls. Bioessays, 29, 188–197.
DNA substitution. Bioinformatics, 14, 817–818. White TJ, Bruns TS, Lee S, Taylor JW (1990) Amplification and
Puccinia Graminis Sequencing Project. Broad Institute of MIT and direct sequencing of fungal ribosomal RNA genes for phylo-
Harvard. http://www.broad.mit.edu. genetics. In: PCR Protocols: A Guide to Methods and Applications
Ricchetti M, Fairhead C, Dujon B (1999) Mitochondrial DNA repairs (eds Innis MA, Gelfand DH, Sninsky JJ, White TJ), pp. 315–
double-stand breaks in yeast chromosomes. Nature, 402, 96–100. 322. Academic Press, New York.
Richly E, Leister D (2004) NUMTs in sequenced eukaryotic Xu J, Singh RS (2005) The inheritance of organelle genes and
genomes. Molecular Biology and Evolution, 21, 1081–1084. genomes: patterns and mechanisms. Genome, 48, 951–958.
Roe AD, Sperling FAH (2007) Patterns of evolution of mitochondrial Yan Z, Xu J (2005) Fungal mitochondrial inheritance and evolu-
cytochrome c oxidase I and II DNA and implication for DNA tion. In: Evolutionary Genetics of Fungi (ed. Xu J), pp. 221–252.
barcoding. Molecular Phylogenetics and Evolution, 44, 325–345. Horizon Scientific Press, Wymondham, UK.
Rokas A, Nylander JAA, Ronquist F, Stone GN (2002) A maximum- Zeng JS, De Hoog GS (2008) Exophiala spinifera and its allies: dia-
likelihood analysis of eight phylogenetic markers in gallwasps gnostics from morphology to DNA barcoding. Medical Mycology,
(Hymenoptera: Cynipidae): implications for insect phylogenetic 46, 193–208.
studies. Molecular Phylogenetics and Evolution, 22, 206–209.
Seifert KA, Samson RA, DeWaard JR et al. (2007) Prospects for
fungus identification using CO1 DNA barcodes, with Penicillium Supporting information
as a test case. Proceedings of the National Academy of Sciences of the
Additional supporting information may be found in the online
USA, 104, 3901–3906.
version of this article:
Skouboe P, Frisvad JC, Taylor JW et al. (1999) Phylogenetic analysis
of nucleotide sequences from the ITS region of terverticillate Fig. S1 Nucleotidic alignments obtained for 14 mitochondrial genes
Penicillium species. Mycological Research, 103, 873–881. encoding subunits of the respiratory chain complexes. The sequences
Smith M, Douhan G, Rizzo D (2007) Intra-specific and intra- were obtained from available Basidiomycete genomic resources as
sporocarp ITS variation of ectomycorrhizal fungi as assessed by detailed in the Material and methods section.
rDNA sequencing of sporocarps and pooled ectomycorrhizal
Fig. S2 Dendrograms constructed with the neighbour-joining
roots from a Quercus woodland. Mycorrhiza, 18, 15–22.
algorithm based on the K2P distance matrices of the ITS, 28S, CO1,
Song H, Buhay JE, Whiting MF, Crandall KA (2008) Many species
atp6, CO3 and nad6 nucleotidic-sequence alignment for the Mela-
in one: DNA barcoding overestimates the number of species
mpsora and the Chrysomyxa data sets.
when nuclear mitochondrial pseudogenes are coamplified. Pro-
ceedings of the National Academy of Sciences, USA, 105, 13486–13491. Fig. S3 G + C content comparisons between the nuclear and mito-
Stoll M, Piepenbring M, Begerow D, Oberwinkler F (2003) Molecular chondrial contigs considered in this study. (A) Plot of the G + C
phylogeny of Ustilago and Sporisorium species (Basidiomycota, content of each contig (nuclear and mitochondrial) considered in
Ustilaginales) based on internal transcribed spacer (ITS) sequences. the genome assemblies; (B) Box plot of G + C content in mitochon-
Canadian Journal of Botany, 81, 976–984. drial and nuclear contigs considered in the genome assemblies.
Stupar RM, Lilly JW, Town CD et al. (2001) Complex mtDNA con-
Table S1 Information about Chrysomyxa and Melampsora speci-
stitutes an approximate 620-kb insertion on Arabidopsis thaliana
mens used in this study
chromosome 2: implication of potential sequencing errors caused
by large-unit repeats. Proceedings of the National Academy of Please note: Wiley-Blackwell are not responsible for the content or
Sciences, USA, 98, 5099–5103. functionality of any supporting materials supplied by the authors.
Swofford DJ (2003) PAUP 4.0 User's manual: Phylogenetic Analysis Using Any queries (other than missing material) should be directed to
Parsimony. Sinauer Associates Inc., Sunderland, Massachusetts. the corresponding author for the article.

© 2009 Blackwell Publishing Ltd and Crown in the right of Canada

You might also like