Professional Documents
Culture Documents
Abstract
PlasmidFinder and in silico plasmid multiLocus sequence typing (pMLST) are two easy-to-use web tools
for detection and characterization of plasmid sequences in whole-genome sequencing (WGS) data from
Enterobacteriaceae. These tools have been adopted worldwide and facilitate plasmid detection and typing
based on draft genomes of multi-drug-resistant Enterobacteriaceae. The PlasmidFinder database currently
includes 133 unique plasmid replicon sequences. It was built starting with 126 sequences devised on fully
sequenced plasmids available at the NCBI nucleotide database in 2014 and has been continuously updated
to include novel replicons detected in more recently sequenced plasmids associated with the family
Enterobacteriaceae. PlasmidFinder is usable for replicon sequence analysis of raw as well as assembled
sequencing data. For pMLST analysis, a weekly updated database was generated from www.pubmlst.org
and integrated into a web tool called in silico pMLST.
1 Introduction
Fernando de la Cruz (ed.), Horizontal Gene Transfer: Methods and Protocols, Methods in Molecular Biology, vol. 2075,
https://doi.org/10.1007/978-1-4939-9877-7_20, © Springer Science+Business Media, LLC, part of Springer Nature 2020
285
286 Alessandra Carattoli and Henrik Hasman
plasmid families, even when the Inc. phenotype has not been
formally confirmed by conjugation against the appropriate refer-
ence plasmids.
With the recent rapid increase in whole-genome and whole-
plasmid sequence data generated by high-throughput sequencing
platforms, there arose a need to translate the Inc. typing and PBRT-
based classification schemes in a tool that can identify replicon
content in raw sequence data or contigs generated by high-
throughput sequencing of entire genomes. Replicon sequences
targeted by PBRT were used for building the first collection of
replicon sequences for the PlasmidFinder database [3]. The analysis
of nucleotide sequences available in GenBank determined the need
to add additional replicon sequences up to the current 110 Plasmid-
Finder Enterobacteriaceae probes to successfully recognize—at
>95% nucleotide identity and >96% coverage—almost all complete
sequences of large plasmids (>20 kb in size) available at the NCBI
database. When replicons cannot be referred to a previously existing
Inc. group nomenclature, then the plasmid is assigned to a group of
homology using replication initiation protein genes as reference for
the new plasmid types.
Among plasmids that can be present in WGS, a large majority
consists of small, ColE-like plasmids that were not detectable and
classified by Inc. typing or PBRT [3, 4]. For these plasmids, multi-
ple phylogenetic analysis of the repA, RNAI, oriT sequences
allowed the identification of 23 sequences that, using the >80%
nucleotide identity and 96% coverage criteria, were able to identify
and classify the small plasmids in WGS in discrete plasmid groups.
In conclusion, a total of 132 replicon sequences, 109 and
23 recognizing large and small plasmids, respectively, are currently
included in the PlasmidFinder Enterobacteriaceae database
(https://cge.cbs.dtu.dk/services/PlasmidFinder/). By BLASTN,
the 132 PlasmidFinder sequences recognize almost 9000 large
and more than 11,000 small complete or partial plasmid sequences,
respectively, at the currently available NCBI nucleotide database
(Dec. 2018).
Since not all plasmid families occur at the same frequency, but
rather some families are prevalent, sequence-based typing schemes
were devised to identify plasmid types within the families. IncF,
IncI1, IncN, IncHI2, IncHI1, and IncA/C plasmids are currently
subtyped by plasmid multilocus sequence typing (pMLST; http://
pubmlst.org/plasmid/) [5–9]. For pMLST analysis, a weekly
updated database was generated from www.pubmlst.org and
integrated into a web tool called in silico pMLST. PlasmidFinder
and pMLST web tools present an opportunity to screen WGS data
obtained from every kind of genome sequencers, and without
particular bioinformatics skills, retrieving plasmid information to
be used in clinical and epidemiological investigations.
PlasmidFinder and In Silico pMLST: Identification and Typing of Plasmid. . . 287
2 Materials
3 Method
3.2 pMLST Single To execute a sequence-type prediction using the pMLST web
Analysis server, the pMLST profile for the plasmid query should be selected.
pMLST has six plasmid configurations that it distinguishes
between. They are IncA/C, IncF, IncHI1, IncHI2, IncI1, and
IncN. Raw data in FASTQ format or preassembled partial or com-
plete genomes in FASTA format can be uploaded (see Note 1).
To input the sequences, a single FASTA file on local disk can be
uploaded by using the applet. For successful typing, a partial
genome must, as a minimum, contain all the loci necessary for
pMLST concatenated in one FASTA file.
The green “Upload” button starts the job. The status of the job
(either “queued” or “running”) will be displayed and constantly
updated until it terminates and the server output page appears in
the browser. There is also the option to input e-mail address to be
notified as soon as the results are ready.
3.3 PlasmidFinder PlasmidFinder and pMLST are also included in the Bacterial Anal-
and pMLST ysis Pipeline—Batch Upload (https://cge.cbs.dtu.dk/services/
in the Bacterial cge/). The CGE Bacterial Analysis Pipeline executes a workflow
Analysis Pipeline: of services with predefined parameters, stores the submitted data,
Batch Upload and results in the database at the user’s disposal. This analysis can
only process preassembled isolates; therefore, contig files in fasta
should be uploaded. The pipeline was benchmarked using datasets
previously used to test the individual services.
Plasmid services included in the Bacterial Analysis Pipeline are
PlasmidFinder-1.2 and pMLST-1.4.
PlasmidFinder and In Silico pMLST: Identification and Typing of Plasmid. . . 289
Fig. 1 PlasmidFinder output. Overview of the PlasmidFinder V2.0 output at the web page. The dark green color
indicates a perfect match for a given plasmid. The %Identity is 100 and the sequence in the genome covers
the entire length of the plasmid in the database. The light green color indicates a warning due to a non-perfect
match. The grey color indicates a warning due to a non-perfect match, query length is shorter than plasmid
replicon length. The red color indicates that no plasmids with a match over the given threshold was found
3.4 PlasmidFinder Once the PlasmidFinder server has finished running the submitted
Output job, it will display a graphical output similar to the example in
Fig. 1.
Output data include the name of input file(s) uploaded by the
user and the selected threshold for minimum percent identity (%
ID) between sequence in the genome of the input isolate and the
plasmid in the database. The output table has seven columns:
(1) replicon if available expressed as an Inc. group against which
the input genome has been aligned; (2) percent identity in the
alignment between the best matching plasmid in the database and
the corresponding sequence in the input genome. A perfect align-
ment is 100%, but it must also cover the entire length of the plasmid
in the database; (3) query length of the best match in the genome
sequence compared to the length of the template (the matching
plasmid replicon in the database); (4) name of contig or scaffold in
which the replicon is found; (5) starting position of the found
290 Alessandra Carattoli and Henrik Hasman
replicon in the contig; (6) notes to the plasmid; and (7) reference
GenBank accession number according to NCBI for the plasmid in
the database. The accession numbers of plasmids that have been
used to build up the PlasmidFinder database are very useful because
the reference plasmid can be used in a BLST2N analysis to detect
other contigs and scaffolds in the query sequence that presumably
belong to the same plasmid whose replicon has been identified by
PlasmidFinder (see Note 3). A FASTA file containing the best
matching sequences from the query genome can be downloaded
at the Hit in genome sequences.
The extended output shows the alignments. In the extended
output format, green color indicates matching nucleotides, red
color indicates mismatches, and gray indicates no query sequence
in part of the alignment. Downloadable files are text files containing
the result table and alignments.
3.5 pMLST Output The output shows the sequence type (ST) that has been associated
with the query and a table that has six columns containing detailed
results (Fig. 2): (a) allele name, (b) percentage of nucleotides that
are identical between the best-matching pMLST allele in the data-
base and the corresponding sequence in the plasmid, (c) length of
the alignment between the best-matching pMLST allele in the
database and the corresponding sequence in the plasmid,
(d) length of the best-matching pMLST allele in the database,
Fig. 2 In silico pMLST output. Overview of the in silico pMLST output at the web page. For a perfect matching
allele, the % identity will be 100, the allele length will equal the query length, and the number of gaps will be
0. Green color indicates a perfect match, while red color indicates an imperfect match or no match at all
PlasmidFinder and In Silico pMLST: Identification and Typing of Plasmid. . . 291
(e) number of gaps in the alignment, and (f) name of the best-
matching pMLST allele for each gene and allele identified. The
output also shows input file name used in the analysis and a possi-
bility to clicks for downloading results text files, as well as an
optional graphical presentation of the alignment for each of the
loci against the allele on the selected pMLST scheme that had the
best alignment score (see Note 4).
4 Notes
a-z
A-Z
0-9
_
-
References
1. Carattoli A, Bertini A, Villa L, Falbo V, Hop- Multilocus sequence typing of IncI1 plasmids
kins KL, Threlfall J (2005) Identification of carrying extended-spectrum beta-lactamases in
plasmids by PCR-based replicon typing. J Escherichia coli and Salmonella of human and
Microbiol Methods 63:219–228 animal origin. J Antimicrob Chemother
2. Datta N, Hedges RW (1971) Compatibility 61:1229–1233
groups among fi - R factors. Nature 6. Villa L, Garcı́a-Fernández A, Fortini D, Carat-
234:222–223 toli A (2010) Replicon sequence typing of IncF
3. Carattoli A, Zankari E, Garcia-Fernandez A, plasmids carrying virulence and resistance
Voldby Larsen M, Lund O, Villa L, Aarestrup determinants. J Antimicrob Chemother
FM, Hasman H (2014) PlasmidFinder and 65:2518–2529
pMLST: in silico detection and typing of plas- 7. Garcı́a-Fernández A, Carattoli A (2010) Plas-
mids. Antimicrob Agents Chemother 58 mid double locus sequence typing for IncHI2
(7):3895–3903 plasmids, a subtyping scheme for the character-
4. Orlek A, Phan H, Sheppard AE, Doumith M, ization of IncHI2 plasmids carrying extended-
Ellington M, Peto T, Crook D, Walker AS, spectrum beta-lactamase and quinolone resis-
Woodford N, Anjum MF, Stoesser N (2017) tance genes. J Antimicrob Chemother
Ordering the mob: insights into replicon and 65:1155–1161
MOB typing schemes from analysis of a curated 8. Garcı́a-Fernández A, Villa L, Moodley A,
dataset of publicly available plasmids. Plasmid Hasman H, Miriagou V, Guardabassi L, Carat-
91:42–52 toli A (2011) Multilocus sequence typing of
5. Garcı́a-Fernández A, Chiaretto G, Bertini A, IncN plasmids. J Antimicrob Chemother
Villa L, Fortini D, Ricci A, Carattoli A (2008) 66:1987–1991
294 Alessandra Carattoli and Henrik Hasman
9. Phan MD, Kidgell C, Nair S, Holt KE, Turner FM, Lund O (2016) A bacterial analysis plat-
AK, Hinds J, Butcher P, Cooke FJ (2009) Var- form: an integrated system for analysing bacte-
iation in Salmonella enterica serovar typhi rial whole genome sequencing data for clinical
IncHI1 plasmids during the global spread of diagnostics and surveillance. PLoS One 11:
resistant typhoid fever. Antimicrob Agents e0157718
Chemother 53:716–727 13. Clausen PTLC, Aarestrup FM, Lund O (2018)
10. Jensen LB, Garcia-Migura L, Valenzuela AJ, Rapid and precise alignment of raw reads
Løhr M, Hasman H, Aarestrup FM (2010) A against redundant databases with KMA. BMC
classification system for plasmids from entero- Bioinformatics 19:307
cocci and other Gram-positive bacteria. J 14. Dolejska M, Villa L, Poirel L et al (2013)
Microbiol Methods 80:25–43 Complete sequencing of an IncHI1 plasmid
11. C L, Garcı́a-Migura L, Aspiroz C, Zarazaga M, encoding the carbapenemase NDM-1, the
Torres C, Aarestrup FM (2012) Expansion of a ArmA 16S RNA methylase and a resistance-
plasmid classification system for Gram-positive nodulation-cell division/multidrug efflux
bacteria and determination of the diversity of pump. J Antimicrob Chemother 68:34–39
plasmids in Staphylococcus aureus strains of 15. Villa L, Poirel L, Nordmann P et al (2012)
human, animal, and food origins. Appl Environ Complete sequencing of an IncH plasmid car-
Microbiol 78:5948–5955 rying the blaNDM-1, blaCTX-M-15 and
12. Thomsen MC, Ahrenfeldt J, Cisneros JL, qnrB1 genes. J Antimicrob Chemother
Jurtz V, Larsen MV, Hasman H, Aarestrup 67:1645–1650