You are on page 1of 62

High-throughput

sequencing for tracking


bacterial and viral disease
Paul D Brown, PhD
MICR3214: Molecular Microbiology
From molecular to genomic
epidemiology
• From a perspective of medical and public
health microbiology and epidemiology,
whole genome sequencing (WGS)
combines two decisive advantages
compared to previous methods:
– Provides maximal strain discrimination
– Can be linked to clinically and epidemiologically
relevant phenotypes
• Widely seen as the ultimate tool for
epidemiological typing of bacteria and other
pathogens
Next-‐Generation Sequencing

• A high-throughput
sequencing method
that parallelizes the
sequencing
process, producing
millions of
sequences at once.
Next-Generation Sequencing

• Over the past six years, “Next-Generation”


sequencing technologies have made
accessible data capable of answering
questions fundamental to our understanding
of life and the factors that govern human
health.

• The combination of the vast increase in data


generated, coupled with plummeting costs
required to generate these data, has rendered
this technology a tractable, general purpose
tool for a variety of applications.
Next-‐Generation Sequencing

• Traditionally, the molecular typing of organisms as


an aid to infection control has been limited to
investigating clusters identified by surveillance
methods to see if the involved isolates are clonal.
• If clonality is established, then the clonal cluster is
assumed to represent a true outbreak, and it is
then investigated to identify infection control
breaches and to institute measures to prevent
further transmission.
• Often some cases can be excluded from the cluster
when they are distinct from the clonal isolates,
making the investigation of the outbreak more
efficient
Next-Generation Sequencing

• If the clonality of a cluster is not established by


molecular typing, then the cluster is usually
presumed to be a “pseudo-outbreak,” occurring
by chance, with no further investigation being
necessary

• It is through these two factors - improving


efficiency of outbreak investigation and the ability
to identify “pseudo-outbreaks” - that molecular
typing can improve the cost-effectiveness of
nosocomial infection surveillance
Next-Generation Sequencing

• For subtyping in the context of molecular


surveillance, there is no need for a fully closed and
annotated genome.
• Instead, for the sake of interpretation, the
complexity of the data needs
to be reduced in a
biologically meaningful way.
• Two different workflows
• genome-wide SNPs
• gene-by-gene systems.
Genome-wide SNPs
• In a genome-wide SNP analysis, only Single Nucleotide
Polymorphisms (SNPs) within a set of samples are taken into
account.
• This excludes indels, inversions, and translocations.
• A consistent position numbering (i.e. against a reference
genome) is maintained.
Gene-by-gene systems

• Gene-by-gene systems are the NGS


variant of “classical” Multi-Locus
Sequence Typing (MLST).
• Instead of using targeted sequencing of
specific loci, whole genomes are
shotgun sequenced and alleles
identified with respect to a reference set
of loci.
• A type is defined by a specific list of
allele numbers.
Gene-by-gene systems
Genome-wide SNPs vs gene-by-gene systems
• Provide a very detailed answer • Resolution can be chosen by
to the subtyping question. selecting an appropriate subset
• Basic question: how many of loci, resulting in detailed up
SNPs would be allowed to very detailed data.
between isolates from the • Issues with paralogous genes
same outbreak? Usually 5-35; and multiple gene copies (e.g.,
may be up to 100 in wgMLST, a certain gene
• Another major drawback is that may be present in some but
the data set in principle is not in all samples.
unstable, because the SNP set • The technique generates a
(i.e. the number of stable data set that can be the
polymorphisms found) basis for an international
depends on the set of isolates nomenclature, provided that
analyzed. allele IDs and sequence types
are properly curated.
Case studies in molecular
epidemiology:
tracking Salmonella and
Listeria to their source
The Salmonella ‘re-emergence’

4
Understanding how Salmonella
contamination found its way to the food supply is not an
easy task...
"The Smokin' Hot Pepper"

(Salmonella Saintpaul outbreak – Summer 2008)


Next-Generation Sequencing (NGS)
provides support for other technologies
and fosters novel targets and assay
design for rapid diagnostics.

MLVA and
CRISPR loci

SNP
Discovery

Biomarker
Assays

NGS

Outbreak
response
Microarray
Targets
FOODBORNE OUTBREAK
INVESTIGATION:
WGS analysis of foodborne salmonellae case study
This investigation focused on Salmonella
Montevideo samples associated with red
and black pepper used in the production
of Italian-style spiced meats in a New
England processing facility. This
manufacturer was implicated in a major
salmonellosis outbreak that affected more
than 272 people in 44 states and DC.
15-20x shotgun sequencing
35 pure culture isolates from patients,
foods and environmental samples.

Concatenate 40 variable genes for IN or OUT?


Phylogenetic analysis
A specific PFGE clone of Salmonella isolates form a single point
or monophyletic cluster based on WGS
Case 2:

S. Senftenberg – “a piggy back serovar”


40 genes vary within an S. Montevideo outbreak
while unique SNPs and a 100kb insertion separates
a CA isolate from the outbreak
Sources track with clusters using WGS
Next-Generation sequencing can be used
to address different facets of outbreak
response:
• Have we seen this isolate before?
• (Compare it to reference isolates)
• Do these isolates form a cluster? (i.e. is it outbreak or
background)
• (Compare them to reference and other outbreak
isolates)
• Is there a similarity between food/environmental and
clinical isolates?
• (Compare them to reference, clinical, and
food/environmental isolates)
• Can we do this?
NGS Analysis Strategy

Several sets of isolates Different PFGE


included to ensure
that NGS can answer
Same PFGE but
the questions that are Not Related
being asked for this
organism and/or
serotype Event-
Related
S. Enteritidis

Several pattern combinations were found during


the 2010 egg outbreak, but JEGX01.0004 is 40%
of all of the S. Enteritidis seen in the PulseNet
database

me
XbaI – PrimaryEnzyme JE
BlnI – Secondary Enzyme
JEGX01.0004 JEGA26.0002
JEGX01.0004 JEGA26.0030
JEGX01.0004 JEGA26.0031
JEGX01.0034 JEGA26.0002
JEGX01.0104 JEGA26.0002

THE PERFECT STORM…


Salmonella Agona
1998/2008 Dry Cereal Outbreaks

XbaI – Primary Enzyme BlnI – Secondary Enzyme


Food 2008
Food 1998
Env. 2008
Env. 2008
Food 2008
Env. 2008

SAME or DIFFERENT?
S. Agona clade:
Salmonella Agona Dry Cereal Outbreak

• Salmonella Agona strains could be readily


distinguished based on WGS analysis
underscoring the ability to differentiate and track
an isolate over time within the same facility.

• The re‐emergence of the isolate in the facility may


have been a combination of renovations in an
older side of the plant which included drilling into a
wall, previously well sanitized, which could have
released the dormant pathogen; also, wet cleaning
sanitation practices could have spread the
pathogen throughout that part of the facility.
Newport Outbreak: Tomatoes
Once tomatoes reach the supply chain, things really “simplify”.

The Fresh-cut Tomato Supply Chain


The 10 riskiest foods

http://www.nextgenerationfood.com/news/risky-‐food-‐list/
http://www.cspinet.org/new/200910061.html
NGS distinguishes geographical structure among closely
related Salmonella Bareilly strains
NGS distinguishes geographical structure among closely
related Salmonella Bareilly strains
NGS distinguishes geographical structure among closely
related Salmonella Bareilly strains
NGS distinguishes geographical structure among closely
related Salmonella Bareilly strains
Real-time Integration of WGS into
FDA regulatory workflow
Applications of WGS in the
food safety environment
• Delimiting scope and trace-back of
food contamination events
• Quality control for FDA testing and
surveillance
• Preventive control monitoring for
compliance standards
• ID, geno/pheno typing schemes
(AST, Serotyping, VP)
Salmonella Enteritidis outbreak
linked to long term care facility
outbreak
Sept. 2010: Connecticut Dept. of Health identifies
a Salmonella outbreak in a long term care facility.
– Outbreak was linked to cannoli from a Westchester
bakery.
Cases were linked to a another cluster in
Westchester NY.

Both NY and CT cases consumed cannolis.

Isolates had a common PFGE / MLVA DNA
fingerprint.
NGS identifies additional outbreak cases

Contemporary isolates:
Two small clusters

Outbreak

Blue: Epidemiologically identified


Red: Additional cases identified with NGS
Listeria monocytogenes
• Gram-positive animal and human food-borne
pathogen
• Facultative intracellular pathogen
• Causes abortion, meningitis, and septicemia
• Can grow at low temperatures
• High infectious dose
• Causes an estimated 1,600 illness and 255
deaths/year in US
• As of May 2013, 47 L. monocytogenes genomes
and 14 genomes for other Listeria spp. available in
GenBank
Public Health Impact of Molecular Epidemiology

70
1993 Western States E. coli O157 Outbreak
60
726 cases outbreak Meat recall
detected 1993
50
Number of

40
4 deaths
Cases

30

20
10
39 d
0
1 8 15 22 29 36 43 50 57 64 71
Day of Outbreak

70
2002 Colorado E. coliO157
Outbreak
60

outbr
outbreak detected 2002
50
of Cases
Number

40

30

18 d
20

10

0
1 15 22 29 36 43 50 57 64 71

Day of Outbreak
If only 5 cases of E. coli O157:H7 infections were averted by the recall of ground beef
in the Colorado outbreak, the PulseNet system would have recovered all costs for
start up and operation for 5 years.
(Elbasha et al. Emerg. Infect. Dis. 6:293-297, 2000)
Use of DNA fingerprinting to track L.
monocytogenes in processing plants
• Environmental Listeria contamination as
significant problem in the food industry
• Controlling environmental L. monocytogenes
contamination in food plants is key to better
control (“Seek and destroy”)
DNA fingerprinting can identify
persistence in plants
Sample Ribotype Sample Source RiboPrint® Pattern
Sample Source
VISIT 1 * 1039C (E) Floor drain, raw materials area
* 1039C (E) Floor drain, hallway to finished area
* 1039C (IP) Troll Red King Salmon, in brine, head area
* 1039C (IP) Troll Red King Salmon, in brine, belly area
* 1039C (IP) Brine, Troll Red King Salmon
* 1039C (IP) Faroe Island Salmon, in brine, head area
* 1039C (F) Smoked Sable
* 1039C (F) Cold-Smoked Norwegian Salmon
VISIT 2 1044A (E) Floor drain, brining cold room 1
1044A (R) Raw Troll Red King Salmon, head area
1044A (IP) Brine, Faroe Island Salmon
1045 (R) Raw Troll Red King Salmon, belly area
1045 (IP) Faroe Island Salmon, in brine, head area
1053 (IP) Norwegian Salmon, in brine
(E)
1062 Floor drain #1, raw materials preparation
(E)
* 1039C (E)
Floor drain #1, raw materials preparation
* 1039C (E)
Floor drain, brining cold room 1
* 1039C Floor drain #2, raw materials preparation
(E)
* 1039C (E)
Floor drain #2, raw materials receiving
* 1039C (E) Floor drain, finished product area
VISIT 3 * 1039C (IP) Floor drain, hallway to finished area
* 1039C (F) Brine, Troll Red King Salmon
* 1039C (IP) Smoked Sable
1044A (IP) Sable, in brine
1044A (IP) Brine, Faroe Island Salmon
1062 Brine, Norwegian Salmon
House bugs & pet Listeria
Plant B Plant C Plant D P-value
Samples n=129 n=173 n=229
Ribotype % Prevalence
1039C 0.0 0.0 10.0 0.0000
1042B 0.8 1.2 0.4 0.8221
1042C 6.2 0.6 0.4 0.0003
1044A 0.0 2.3 3.1 0.1494
1045 5.4 0.0 0.9 0.0006
1046B 0.0 2.3 0.0 0.0144
1053 0.0 0.6 1.7 0.2686
1062 0.8 0.6 2.6 0.1822
L. monocytogenes persisted in rubber
floor mats despite sanitation

Listeria can be protected from sanitizer in “micro-cracks”, but can


be squeezed out by pressure if people stand on mats
2000 US outbreak - Environmental
persistence of L. monocytogenes?
• 1988: one human listeriosis case linked to hot dogs
produced by plant X
• 2000: 29 human listeriosis cases linked to sliced turkey
meats from plant X
Rapid Whole Genome Sequencing based
subtyping of L. monocytogenes
• DNA extraction
• Library prep
3 days

• Sequencing on Bench top sequencer (MiSeq, Ion Torrent)


24 h

• De novo assembly
• Rapid classification to subpopulation using pairwise distances
based on average nucleotide identity values (BLAST)
12 h • Inference of subpopulation structure based on SNP calling.

Collaboration with CDC (C. Tarr)


L. monocytogenes genome sequencing at
CDC
Summary

• Full genome sequencing has potential as both a


primary and as a secondary subtyping method in
outbreak investigations
– Reduces scope of outbreaks
– Identifies root causes, which prevents future outbreaks
• Other application of genomics and molecular biology
can help prevent foodborne disease cases and
outbreaks
– Use of genomics and RNA-seq to develop control strategies
http://dashburst.com/infographic/big-data-volume-variety-velocity/
Overall summary and conclusions
• While genome sequencing is making “real world”
contributions to food safety
Improved
– subtyping over PFGE
Identification
– of better target genes for detection
Translation
– of transcriptomics, metabolomics etc.
findings to improved prevention and treatment is
in the early stages
• Genomics is only part of “big data”
– Future generations of food scientists needs to be
able to play in the “big data” pond

You might also like