You are on page 1of 29

7/25/2022

Sequence‐based Bacterial Typing: Concepts and Approaches
OHSI 2022

7/25/2022
Mostafa Ghanem, University of Maryland
mghanem@umd.edu

Outline
• Different Sequence Based Typing Approaches.

• Concept of Each Approach

• Advantages And Disadvantages

1
7/25/2022

Molecular typing (Genotyping)

a) DNA banding pattern based methods (RAPD, PFGE, RFLP)

b) DNA hybridization‐based methods (Microarrays)

c) DNA sequencing based methods

Sequence based typing
Seq1    ACTGTTGGCACGATTT
Seq2    AGTGTTGCCACGATTT
Seq3    ACTGTTGGCAGGATTT
Seq4    ACTGTAGGCACGATTT
Seq5    ACTGTAGGCACGATTA
Sanger method,
1977

The first molecular sequence based phylogenetic classification of living organisms into three main domains 
4

2
7/25/2022

What is a SNP?
Single Nucleotide Polymorphism (SNP)
ATGTTCCTC sequence
ATGTTGCTC reference
*phylogenetically informative differences

Insertion or Deletion (Indel)
ATGTTCCCTC sequence
ATGTTC-CTC reference
*differences not used in hqSNP analysis

Large recombination event that introduces a large 
prophage

Use of sequence data to assess relatedness of 
organisms
 Differences in sequences can be used to assess relatedness of 
organisms and the likelihood of recent common ancestor
 Definition of “recent” becomes important – recent in years or generation 
times
 Salmonella in a dry processing plant may stay dormant and rarely if ever multiply (or 
imagine anthrax spores in soil) 
 Salmonella in a chicken flock may multiply every 30 min (>7,500 times a year)
 Assessing relationships of microbial isolates typically requires 
more information than just sequence data

3
7/25/2022

Medini et al. (2008).

Level of discrimination
Low – few or multiple stable genes – look at long term evolutionary trends
High – more genes, possibly variable gene(s) ‐ outbreak investigation / local surveillance

Typing approaches
Protein Serotyping
DNA
PFGE
Pulsed Field Gel Electrophoresis
Total gDNA fragments

16S rRNA
Information 

Ribosomal RNA Sequencing
1 gene
Sequencing
MLST
Multi Locus Sequence Typing
7 genes

wgMLST
Whole Genome Multi Locus Sequence Typing
Thousands of reference genes plus pan genome
WGS
wgSNP or hqSNP
Whole Genome Single Nucleotide Polymorphism Typing
Total gDNA

4
7/25/2022

Sequence Based Approaches
 Single locus based methods

 Multilocus Sequence Typing (MLST) 

 K‐mer–based typing approaches

 High quality SNP (hqSNP) typing approaches

 Allele based typing approach(rMLST‐ cgMLST‐wgMLST)

 WGS for phenotypic typing (AMR typing, virulence typing)

Fig: The number of publications related to bacterial typing methods as a


function of time (Losada, et al., 2013)

5
7/25/2022

Typing Approach evaluation criteria
Typeability Capacity to produce clearly interpretable results with most strains of
the bacterial species

Reproducibility Capacity to repeatedly obtain the same typing profile result with the
same bacterial strain

Discriminatory power Ability to produce results that clearly allow differentiation between
unrelated strains of the same bacterial species

Practicality Method should be versatile, relatively rapid, inexpensive, technically


(ease of performance simple and provide readily interpretable results
& interpretation)

Single locus based methods
• 16S rRNA region sequence
• Ribosomal intergenic spacer analysis
• 16S–23S IGSR of MG
• Surface variable and polymorphic genes 
vlhA typing of MS  
spa typing for Staph
Disadvantage:
Insufficient discriminatory power
Reliability & Evolutionary relationship

12

6
7/25/2022

Traditional MLST (Allele based typing)

• The gold standard before WGS
• HK Genes, Population structure  13
http://beta.mlst.net/Instructions/default.html

Sequence type (ST) VS Clonal Complex (CC)

• ST is identified based on the allelic profile of the 7 genes 
• CC is a group of STs that are similar in 5 or more alleles to a 
central (ancestral) sequence type.
• A ST could be 
• Single locus variant (SLV)
• Double locus variant (DLV)

7
7/25/2022

Sequence type (ST) VS Clonal Complex (CC)

Fig a minimum spanning tree for the 101 samples and eight clonal complexes typed by the seven loci MLST.
15
Ghanem and El‐Gazzar,  2019

Traditional MLST
Advantages:
• High typeability & reproducibility
• Sequence‐based‐ high accuracy and relative discrimination.
• Central database‐ easier exchange of data and comparison of strains 
globally.(https://pubmlst.org/)
• Expandable nomenclature
• Now can be performed using WGS 
Disadvantages:
• Not useful for organisms with conserved HKGs.
• Targets selected to represent population structure, not as useful for 
outbreak detection

16

8
7/25/2022

Whole genome sequence based approaches

Medini et al. (2008).


17

Whole Genome Sequencing
ATGCGTGATCTAGTAGTCTAGGAGCTGACCGATTA
ATGCGTGATCTAGTAGTCTAGGAGCTGACCGATTA
ATGCGTGATCTAGTAGTCTAGGAGCTGACCGATTA
ATGCGTGATCTAGTAGTCTAGGAGCTGACCGATTA
ATGCGTGATCTAGTAGTCTAGGAGCTGACCGATTA
ATGCGTGATCTAGTAGTCTAGGAGCTGACCGATTA
ATGCGTGATCTAGTAGTCTAGGAGCTGACCGATTA
ATGCGTGATCTAGTAGTCTAGGAGCTGACCGATTA
ATGCGTGATCTAGTAGTCTAGGAGCTGACCGATTA
ATGCGTGATCTAGTAGTCTAGGAGCTGACCGATTA
ATGCGTGATCTAGTAGTCTAGGAGCTGACCGATTA
ATGCGTGATCTAGTAGTCTAGGAGCTGACCGATTA
ATGCGTGATCTAGTAGTCTAGGAGCTGACCGATTA
ATGCGTGATCTAGTAGTCTAGGAGCTGACCGATTA
ATGCGTGATCTAGTAGTCTAGGAGCTGACCGATTA
ATGCGTGATCTAGTAGTCTAGGAGCTGACCGATTA
ATGCGTGATCTAGTAGTCTAGGAGCTGACCGATTA

ATGCGTGATCTAGTAGTCTAGGAGCTGACCGATTA

Results interpretation 
and action

9
7/25/2022

K‐mer–based typing approaches
 Compare the genome small piece‐by‐small piece 

to find pieces that are different

• Assembly and alignment free approach

• Avoid the high computational needs

• Faster and more suitable for real time epidemiological typing

Disadvantage

• Doesn’t consider the genomic context and sequence quality at each locus

• No true phylogenetic relationship

19

High Quality Single Nucleotide Polymorphisms 
(hqsnps) Typing Approaches
Reference genome

• Serve as a suitable template against which all reads 
from other genomes will be mapped to it.
Reference 
mapping 
• Has to be a closely related to the mapped samples.

• Selection of ref. genome vary according: SNP 
 Genomic diversity of the organism Detection 
 Aim of the study 
 Context of the investigation SNP 
Evaluation
20

10
7/25/2022

What makes a SNP high quality (hq)?

Sequence 
Sequence  reads Apply a quality filter that filters out 
Reads nucleotides in sequence reads for 
comparison based on sequence coverage 
Sequence  and quality
reads

Quality filtered Sequence 
Reads ready for analysis

The alphabet soup of analysis – Coverage

Coverage at 40x Coverage at 5x http://missusrousselee.deviantart.com/art/Alphabet‐


Soup‐134724659

 Any single location on the genome can have 
zero to hundreds of sequence reads that 
cover the one region

11
7/25/2022

What to call a SNP ATGTTACTC
ATGTTCCTC
ATGTTCCTC
ATGTTCCTC
ATGTTCCTC

 SNPs called based on: ATGTTCCTC
ATGTTTCTC
ATGTTCCTC ATGTTCCTC
ATGTTCCTC ATGTTCCTC
• Quality ATGTTCCTC
ATGTTGCTC
ATGTTCCTC
ATGTTGCTC
reference
• Coverage  Is it a SNP?

• Base frequency
 The differences between the reference and 
compared genome are extracted and used 
to determine relatedness

(hqsnps) Typing Approaches

Detected SNP screened for 
Functional implications (genic or inter‐genic, 

synonymous or non‐synonymous).
• Distribution across the reference genome (eg. 
Highly variable regions) 
SNP related to recombination are excluded. 

24

12
7/25/2022

Where to call a SNP?
 Not all SNP pipelines are equal – where you call SNPs will affect the total SNP 
count
 SNPs relevant for phylogenetic analysis are vertically transmitted, not 
horizontally, so horizontal genetic elements like phages can be masked

Mobile elements

genes

Raw reads

Mask mobile elements Only call SNPs in genes
‐do no consider SNPs in this location

High Quality Single Nucleotide Polymorphisms 
(hqsnps) Typing Approaches
• Remaining SNP are representative to parsimoniously 
informative loci within the core bacterial genome 

• hqSNP typing approach is the most widely used whole 
genome strain typing methods.

• Works best with clonal organisms like salmonella

26

13
7/25/2022

How to report SNP data – keep it simple
New Cluster: 2016039
Hi folks:
Two isolates are 0 SNPs from each other:
E2017003216 (SE77B52)
E2017003039 (SE77B52)

New Cluster: 2016040

Two isolates are 2 SNPs from each other:


E2017002910 (SE1B1)
I2017003132 (SE1B1) 27

Limitations of Whole‐genome SNP typing approaches
• Difficulty to apply for long term or global scale multi‐outbreak 
analysis
• Computational resources and experienced bioinformatics are 
necessary
• challenging in bacteria with high genomic diversity and /or 
extra‐chromosomal or mobile genetic elements.
• Creating standard method for WGS SNP typing very difficult and 
impractical. 

28

14
7/25/2022

hqSNP analyses 
Advantages Disadvantages When to Use
Phylogenetically Requires a closely related reference  Good for situations where a 
informative genome – hqSNP analysis is  wgMLST database has not 
(build a tree consistent  problematic if reference genome is  been developed and validated. 
with evolution of the  not closely related May provide highest amount of 
strains) resolution for strain 
comparison
SNP position can be  Takes a while and requires a lot of 
identified on genome computer power
(gene affected can be 
identified)
Interpretation of data depends on 
genomes added – is not stable and 
does not lead to nomenclature

Allele based typing
(rMLST‐ cgMLST‐wgMLST)
Expanding the concept of MLST from 7 genes to genome‐wide  gene by gene based typing 
approach.

= ≠
Reproducibllity + Portablity

Cg Cg
Discriminatory Power 

SNP MLST 
100s‐1000s genes

few
gene’s 
SNP MLST  5‐7 genes
30

15
7/25/2022

Allele based typing
(rMLST‐ cgMLST‐wgMLST)
 Database is built from gene content representing a diverse selection of the 
genus/species of the organism being compared 
 Each unique gene is referred to as a “locus” 
 Any changes – SNP, insertions, deletions – equals a new allele call for a locus
 New alleles are named sequentially when encountered‐ not based on 
sequence

2 SNPs 1 indel
Locus 1 ACTAGAGGGAAA ACTAGAGGCTAA ACT-GAGGGAAA
allele 1 allele 2 allele 3

Allele based typing
(rMLST‐ cgMLST‐wgMLST)
 Allows for simpler analysis and clear naming of subtypes
 Performs comparison on a gene by gene level
Isolate A Isolate B Isolate C
Locus 1 (20 nt) 1 1 1
Locus 2 (100nt) 8 8 12
Locus 3 (5000nt) 5 5 2
Etc.
Locus 2,005 (5nt) 4 4 4
wgMLST type A A B

16
7/25/2022

http://www.ridom.de/seqsphere/cgmlst/

Fig. Standardized hierarchical microbial WGS typing approach. From bottom to top
with increasing discriminatory power.

MLST vs   cgMLST

cgMLST
Discriminatory Power 

7 genes MLST

34 Ghanem and El‐Gazzar,  2018

5 genes MLST

17
7/25/2022

Allele based typing
(rMLST‐ cgMLST‐wgMLST)
The allele calls at each locus are compared between isolates and 
differences are used to determine relatedness

Fig. A) Seven‐locus MLST 
dendrogram displaying 101 
samples including sanger 
sequenced clinical samples. 
The sequence type (ST) and 
the clonal complex (CC) B) 
cgMLST dendrogram
displaying 81 clinical and 
reference MG samples.

36
Ghanem and El‐Gazzar,  2019

18
7/25/2022

The new way to use in silico MLST in the NGS era. 

Kimura et al., 2017
37

Fig. Schematic representation of the Bacterial Isolate Genome Sequence Database Platform and the gene‐by‐gene 
approach to nucleotide sequence analysis.
38
Cody et al., 2014

19
7/25/2022

cgMLST
• Became very popular 

• Public databases

• (cgMLST.org‐ BIGSdp)

39

How to report wgMLST data – keep it simple


New Cluster: 2016039
Hi folks: Two isolates are 0 alleles from each other:
E2017003216 (SE77B52)
E2017003039 (SE77B52)

New Cluster: 2016040

Two isolates are 2 alleles from each other:


E2017002910 (SE1B1)
I2017003132 (SE1B1)
40

20
7/25/2022

Allele based typing approach
(rMLST‐ cgMLST‐wgMLST)
• Unique and expandable nomenclature
• Can be standardized
• No need for reference genome for mapping
• Applied to related and non related genomes  (multiple outbreaks)
• Computationally less intensive
• Lineage specific SNP/allele approaches can be used to gain  more 
discriminatory power.

41

Allele based typing approach
(rMLST‐ cgMLST‐wgMLST)
 Faster than analyzing SNP differences
 For WGS data, allele calls can be performed on short 
reads (“assembly free”) and assembled genomes 
(“assembly‐based”)
 If there is a conflict between the allele calls then no 
allele call is made

42

21
7/25/2022

Limitations of cgMLST

• Dependence on variation within a set of predefined 
loci.

• Information within noncoding sequences or non‐
predefined loci will not be included in the analysis.

43

Advantages and Caveats of wgMLST analysis

Advantages Disadvantages When to Use


Phylogenetically informative Initial assignment of alleles is  Surveillance, 
computationally costly especially for a 
distributed testing 
network 

All virulence, serotyping, and antibiotic  Comparing character data (allele numbers)  Reference 


resistance genes can be pulled out as part of  rather than genetic data characterization
analysis
Neutralizes the effects of horizontal gene  SNPs and indels treated equally Accurate cluster 
transfer detection 

Allele calling is stable – data standardizable;  Requires curation for allele calls Need to 


directly comparable between laboratories;  communicate with 
reproducibility not dependent on choice of  partners using 
reference strain; amenable to automated  stable 
bioinformatics nomenclature

22
7/25/2022

hqSNP versus cgMLST Analysis
 Both analyses conducted from the same raw data (typically 
short read sequencing data)
 For public health purposes, both correlate well
 i.e the outermost branches of phylogenetic trees are 
almost identical
 The two are not mutually exclusive
 For some use cases cgMLST works better, others SNP 
works better
45

Limitations of WGS based typing approaches
 Defining the gold standard???
• When to judge  two isolates as indistinguishable, closely related, 
possibly related, different 
• Relating results to clinical and epidemiological data
• Using results to answer different questions 
• How the results compare to traditional typing methods (PFGE)
 Cost and difficulty of analysis

23
7/25/2022

Reference Characterization by WGS
“One Shot” Characterization of STEC

ANI GENUS/SPECIES: Escherichia coli

SerotypeFinder SEROTYPE: O104:H4


PATHOTYPE: Shiga toxin producing and Enteroaggregative E. coli (STEC & EAEC)
VirulenceFinder VIRULENCE PROFILE: stx2a, aggR, aggA, sigA, sepA, pic, aatA, aaiC, aap
7-gene MLST SEQUENCE TYPE: ST678
ResFinder ANTIMICROBIAL RESISTANCE GENES: blaTEM-1, blaCTX-M-15, strAB, sul2, tet(A)A, dfrA7

Phylogenetic ID wgMLST CODE: 102:45.26.35.3

Summary of Potential WGS Applications
 Outbreak investigation
 Sporadic vs outbreak
 Not just cluster but phylogenetic relationships
 Microbial Source Tracking (MST)
 Microbial Surveillance
 Food
 Environment
 Animals, soil, food prep areas, hospitals, etc

 Antibiotic resistance monitoring
 Genotype predicts phenotype
 Mobile vs integrated
 Virulence gene monitoring
 What else???

24
7/25/2022

Summary

 Different Sequence Based Typing Approaches.

 Concept of Each Approach

 Advantages and Disadvantages

49

Resources and References
New York Integrated Food Safety Center of Excellence, Molecular Epidemiology and Sequencing Approaches 
in Public Health ‐ Webinars
Schürch, A. C., et al. "Whole genome sequencing options for bacterial strain typing and 
epidemiologic analysis based on single nucleotide polymorphism versus gene‐by‐gene–
based approaches." Clinical Microbiology and Infection 24.4 (2018): 350‐354.
Pérez‐Losada, Marcos, et al. "Pathogen typing in the genomics era: MLST and the future 
of molecular epidemiology." Infection, Genetics and Evolution 16 (2013): 38‐53.
Pérez‐Losada, Marcos, Miguel Arenas, and Eduardo Castro‐Nallar. "Microbial sequence 
typing in the genomic era." Infection, Genetics and Evolution (2017).
Cody, Alison J., Julia S. Bennett, and Martin CJ Maiden. "Multi‐locus sequence typing and 
the gene‐by‐gene approach to bacterial classification and analysis of population 
variation." Methods in microbiology. Vol. 41. Academic Press, 2014. 201‐219.
Chui, Linda, and Vincent Li. "Technical and Software Advances in Bacterial Pathogen 
Typing." Methods in Microbiology. Vol. 42. Academic Press, 2015. 289‐327.

50

25
7/25/2022

Questions?

Thanks

Mostafa Ghanem
Department of Veterinary Medicine, University of Maryland
301.314.1191/ mghanem@umd.edu

26
7/25/2022

WGS for phenotypic predictions
 The center for genomic epidemiology provide web based services 

 AMR typing using Resfinder

 Virulence typing using virulence finder

 It uses BLAST for identification of acquired AMR  and  virulence genes in whole‐
genome data (pre‐assembled, partial or complete genomes).

53

Proof of concept for AMR phenotypic prediction

 The ResFinder tool was utilized to ID resistance genes in 23 bacterial isolates of 5 different 


species.

 A predicted resistance phenotype was determined based on resistance genes identified


using blast with 100 % Identity

 The predicted resistance phenotype was compared with the original phenotypic resistance
profile

 Almost complete agreement between predicted resistance phenotype and phenotypic 
testing
 Zankari, Ea, et al. "Identification of acquired antimicrobial resistance genes." Journal of antimicrobial chemotherapy67.11 (2012): 2640‐2644.

27
7/25/2022

Proof of concept for genotypic monitoring


1. Zankari E., et al. Genotyping using whole‐genome sequencing is a realistic alternative to surveillance
based on phenotypic antimicrobial susceptibility testing. J Antimicrob Chemother. 2013. 68(4):771‐7.
2. Gordon NC., etal. Prediction of Staphylococcus aureus antimicrobial resistance by whole‐genome 
sequencing. J Clin Microbiol. 2014. 52(4):1182‐91.
3. Stoesser N, et al. Predicting antimicrobial susceptibilities for Escherichia coli and Klebsiella
pneumoniae isolates using whole genomic sequence data. J Antimicrob Chemother. 2013.
68(10):2234‐44.
4. Chakravorty S., et al. Genotypic susceptibility testing of Mycobacterium tuberculosis for amikacin and
kanamycin resistance using a rapid Sloppy Molecular Beacon based assay identifies more cases of low level
drug resistance than phenotypic Lowenstein‐Jensen testing. J Clin Microbiol. 2015 53:43‐51.
5. Tyson, G.H. et al. Whole‐genome sequencing accurately predicts antimicrobial resistance in
Escherichia coli. J Antimicrob Chemother. 2015 Oct;70(10):2763‐9.
6. Zhao, S., et al. Whole genome sequencing analysis accurately predicts antimicrobial resistance phenotypes
in Campylobacter. Appl Environ Microbiol. 2015 Oct 30;82(2)
7. Tyson G.H., et al. Using genotypic methods to determine streptomycin resistance breakpoints for Salmonella
and Escherichia coli. FEMS Microbiol Lett. 2016 Feb;363(4).
And more to come

Pipelines for detection of antimicrobial 
resistance genes
 AMRFinderPlus
https://www.ncbi.nlm.nih.gov/pathogens/antimicrobial‐resistance/AMRFinder/
 ARG‐ANNOT – Antibiotic Resistance Gene‐ANNOTation
http://en.mediterranee‐infection.com/article.php?laref=283%26titre=arg‐annot
 ARDB – Antibiotic Resistance Gene Database (not maintained anymore)
https://ardb.cbcb.umd.edu/
 CARD – The Comprehensive Antibiotic Resistance Database
https://card.mcmaster.ca/
 Resfams
http://www.dantaslab.org/resfams/
 SSTAR – Sequence Search Tool for Antimicrobial Resistance
https://github.com/tomdeman‐bio/Sequence‐Search‐Tool‐for‐Antimicrobial‐
Resistance‐ SSTAR‐

28
7/25/2022

Limitations of WGS predictions
 Can only identify known resistance genes/mutations
• Novel genes or variants may not be detected if low homology to
known ones

 Need a comprehensive, accurate, highly curated and updated resistance


gene database

 Expertise needed to analyze data


• Automation making it easier

 Fragmented genomes
• Complicates identification of resistance elements
• Assembly methods may improve, raw data always available

29

You might also like