You are on page 1of 6


in Dental Research

Next-generation Sequencing Approaches to Understanding the Oral Microbiome

E. Zaura
ADR 2012 24: 81
DOI: 10.1177/0022034512449466

The online version of this article can be found at:

Published by:

On behalf of:
International and American Associations for Dental Research

Additional services and information for Advances in Dental Research can be found at:

Email Alerts:




>> Version of Record - Aug 16, 2012

What is This?

Downloaded from at Sichuan University on September 17, 2012 For personal use only. No other uses without permission.

2012 International & American Associations for Dental Research

Next-generation Sequencing Approaches to
Understanding the Oral Microbiome

was limited to a fraction of species and strains that could be iso-

E. Zaura lated from laboratory cultures. Since the 1980s, when it became
possible to determine the nucleotide order by sequencing, com-
Department of Preventive Dentistry, Academic Centre for Dentistry plete microbial genomes and fragments (small subunit ribosomal
Amsterdam, University of Amsterdam and VU University Amsterdam, genes) have been sequenced. This has provided information on
Netherlands; the broad diversity of microbial taxa regardless of their culti-
Adv Dent Res 22(4):81-85, 2012 vability. There are now twice as many oral microbial species
that have been identified only by cloning and sequencing than
by culturing efforts (Dewhirst et al., 2010). New possibilities
in microbiome studies were opened during the last decade with
Abstract the advent of high-throughput genome sequencing, also known
Until recently, the focus in dental research has been on studying as next-generation sequencing (NGS) (for an extensive review
a small fraction of the oral microbiomeso-called opportunistic on the methodology and recent examples on applicability in the
pathogens. With the advent of next-generation sequencing oral field, see Siqueira et al., 2012). In this overview, the basic
(NGS) technologies, researchers now have the tools that allow principles of the technology and data analyses will be described,
for profiling of the microbiomes and metagenomes at unprece- followed by the most common pros and cons, and the potential
dented depths. The major advantages of NGS are the high improvements and solutions to the encountered shortcomings.
throughput and the fact that specific taxa do not need to be tar- Finally, the future research perspective beyond the microbiome
geted. The relatively low cost and the availability of sequencing will be addressed.
facilities have contributed to nearly exponential growth of NGS
datasets. The quality and interpretation of the NGS data could
be undermined at numerous stepsfrom sample collection, Amplicon Sequencing for Taxonomic
storage, and DNA extraction to PCR bias, sequencing errors, Profiling of the Microbiome
choice of algorithms for data processing, and statistical analy-
Fragments of genomic DNA are sequenced in a massively paral-
ses. Making sense out of this data deluge is and will be the major
lel way, resulting in gigabases of sequence data. The obtained
challenge. The community analyses based on systems ecology
results are at an unprecedented sequencing depth if compared
principles will bring us closer to an understanding of the under-
with the conventional cloning and sequencing approach. For
lying forces that facilitate the stability (or imbalance) of the
bacterial community profiling (Fig., A), hypervariable regions
microbiome. The next logical step will take us beyond the
of a small subunit of the ribosomal gene, the 16S rRNA, are
microbiome. The integration of bacterial, viral, fungal meta-
used. These regions are flanked with conserved parts of the 16S
omes such as the meta-transcriptome, meta-proteome, and
rRNA gene, which are used in primer designs to target as diverse
meta-metabolome, together with the host as a major co-factor,
a bacterial population as possible (so-called universal bacterial
should be the ultimate goal in unraveling the complexity of the
primers). The sequences of the hypervariable regions them-
oral interactome.
selves are used to discriminate among different bacterial taxa.
The resulting sequence data require processing through a bioin-
S ince the introduction of Kochs postulates to the microbial
etiology of infectious diseases, the focus in medical micro-
biology has been on identifying pathogens and deciphering their
formatics pipeline. This pipeline should ensure that low-quality
sequences are discarded and meaningful groups or clusters of
sequencesoperational taxonomic units (OTUs)are created.
virulence. Oral microbiology followed in the footsteps of medi-
The representative sequence of each OTU is then compared with
cal microbiology, and research efforts have resulted in a wealth
sequences found in publicly available databases, e.g., Ribosomal
of knowledge on opportunistic pathogens within the commensal
Database Project (RDP) (Cole et al. 2009), and, when possible,
oral microbiota. A milestone in microbiology was the under-
a consensus taxonomic lineage (genus, family, or higher taxon)
standing that bacterial cells are essential to the well-being of
is given to the OTU.
their host. Since these microscopic inhabitants of our body out-
number our own cells, we are, in fact, walking microbes. For
decades, however, assessment of microbiota of the human body
Key Words
DOI: 10.1177/0022034512449466 microbiome, metagenome, high-throughput nucleotide sequencing,
bioinformatics, data analysis, data quality.
International & American Associations for Dental Research

Downloaded from at Sichuan University on September 17, 2012 For personal use only. No other uses without permission.

2012 International & American Associations for Dental Research

82 Zaura Adv Dent Res 24(2) 2012

Currently, there are two approaches that are used in microbial The information obtained by NGS is not targeted to a priori-
community sequence data analysis. In the analysis based on selected taxa as, e.g., in the microarray approach. This allows
OTUs, each OTU is treated equally. This way, poorly defined for open-ended view on a whole breadth of the microbiome and
microbial ecosystems can be compared without the need to provides a new opportunity for oral-health-related studies.
exclude the undefined, potentially novel phylotypes that cannot Research questions on microbial stability and ecological shifts
be assigned to a consensus taxonomy lineage. Another approach due to environmental and host factors can now be addressed on
is based on phylogeny. Here, the phylogenetic information of a full scale of community complexity.
each OTU is used to account for the degree of divergence
between sequences. It has been shown that variances in 16S CONs of Amplicon Sequencing in
rRNA sequences correlate positively with phenotypic variances
of microbiota (Nubel et al., 1999). By the phylogenetic approach,
Microbiome Studies and Ways to Increase
different communities or sample types are compared based on the Success Rate
their evolutionary distances. The sequencing throughput of NGS provides exciting opportu-
Regardless of which of the two approaches is used, there is a nities for research. The down side of this enthusiasm is that the
broad range of diversity measures that can be applied to the quality of data generated is often undermined by failure to
obtained sequence dataset. Alpha diversity regards diversity address fundamental aspects of experimental design (Rogers
within each sample, e.g., species richness, species evenness, and and Bruce, 2010). Only 18% of sequencing studies published in
diversity. The oral microbiome, for instance, represents a micro- 2009 in major ecological journals analyzed replicate samples
bial ecosystem with low evennessfew taxa such as strepto- (Prosser, 2010). This may be due to a belief that these cutting-
cocci, veillonellae, and prevotellae dominate the samples edge technologies are exempt from normal standards, and that
obtained from dental plaque or saliva (Keijser et al., 2008; Cri- the costs involved justify the lack of proper design (Prosser,
elaard et al., 2011). Alpha diversity can be used to describe the 2010). With the decreasing costs of sequencing and the avail-
stability of the particular ecosystem. Highly diverse ecosystems ability of sample identification tags, lack of sample replicates
are considered more stable or healthier than communities that should not be accepted. As with any conventional studies, statis-
are dominated by few taxa. tics should be planned in advance, before the start of the study.
Beta diversity considers the differences (both qualitative and The major disadvantage of current NGS used for amplicon
quantitative) between different environments. Beta diversity tests sequencing is the short read length. This precludes accurate taxo-
include statistical comparisons that allow for the assessment of nomic identification and results in low taxonomic resolution.
association of a certain microbial profile with a certain clinical Most reads can be identified to genus level, but only a fraction to
status. One of the widely accepted methods, UniFrac (Lozupone species level. This depends on 16S rRNA gene sequence homol-
and Knight, 2005), is a phylogeny-based method that can be used ogy among the members of the same genus. Depending on a
to detect qualitative (Unweighted UniFrac) and quantitative dif- hypervariable region of the 16S rRNA gene that is targeted, dif-
ferences (Weighted UniFrac) among the different sample groups ferent taxa are either missed or can be classified only at a higher
(Lozupone et al., 2007). Making sense out of this data deluge is taxonomic level (family, class, or even phylum). The problem can
and will be the major challenge. Next to microbiology, training in be diminished by in silico analyses of 16S rRNA sequences prior
bioinformatics and molecular microbial ecology will become to selection of the target region (Brandt et al., 2012). However,
mandatory for the researchers of today and tomorrow. some taxa, like several closely related streptococci, will be hard
or even impossible to distinguish by 16S gene sequence. In that
PROS of Amplicon Sequencing by the Ngs case, either a more specific gene instead of the small subunit
Approach rRNA gene should be targeted, or a full metagenomic sequencing
(described below) should be applied.
Amplicon sequencing by the NGS approach provides high- As any other molecular-biology-based method, NGS suffers
throughput sequence information at an exceptional depth. from typical bias of DNA extraction and amplification (Hong
Hundreds of thousands of sequences can be obtained from a et al., 2009; Nadkarni et al., 2009), such as selectivity of prim-
single sample (Keijser et al., 2008), or, if required, specific ers and intrinsic differences in the amplification efficiency of
nucleotide barcodes can be added to mark each sample. This templates. Unfortunately, bias by DNA amplification by poly-
will reduce not only the sequencing depth but also, simultane- merase chain-reaction (PCR) can be avoided only by avoiding
ously, the costs incurred per individual sample. There is a trade- the amplification step itself. This would mean choosing the full
off, however. By reducing the sequencing depth, one might lack metagenomic sequencing approach instead of amplicon sequenc-
the discriminatory power for effects of subtle interventions, ing of hypervariable regions of the 16S rRNA gene.
such as supplementation with pre- and probiotics. Major eco- The quality of the DNA extraction protocol should be evalu-
logical shifts, such as effects of potent antimicrobials, would ated in advance. In standard protocols, all DNA, including DNA
still be discernible at a relatively low sequencing depth from dead or damaged cells and extracellular matrix, is used in
(Kuczynski et al., 2010). The actual number of reads per sam- the downstream analysis. In intervention studies, where treat-
ple, however, needs to be validated per intervention. ment has an antimicrobial potential, it would lead to underesti-
The results of NGS are not biased by fast-growing, easily mation of the effects of the intervention. The solution lies in the
cultivable taxa, and allow for hypothesis-driven research on removal of all DNA that does not originate from intact cells
previously unknown and unclassified micro-organisms. before the DNA extraction process. One such approach is to

Downloaded from at Sichuan University on September 17, 2012 For personal use only. No other uses without permission.

2012 International & American Associations for Dental Research

Adv Dent Res 24(2) 2012 Next-generation Sequencing and Microbiome 83

Figure. Flow diagram of two next-generation sequencing approaches for microbiome studies. (A) Sequencing of DNA amplicons targeting spe-
cific 16S rRNA gene fragments (hypervariable regions). The obtained sequence data are compared with sequences in small subunit ribosomal
RNA gene databases and are used for taxonomic profiling and diversity analyses. (B) Direct sequencing of random DNA fragments, also called
metagenomic shotgun sequencing. A sequence contig is a contiguous, overlapping sequence resulting from the re-assembly of the small DNA frag-
ments. The obtained sequence data are compared with full-genome reference databases and are used to describe the predominant functions of the
microbial communities, as well as to identify the microbial taxa. Steps indicated in gray differ between the two methods.

incubate samples with propidium monoazide (PMA) (Loozen Another cleaning step besides removing the noise is iden-
et al., 2011), where PMA will bind to extracellular DNA and to tification and removal of chimeric sequences (Edgar et al.,
DNA of cells that have lost their structural integrity. By expo- 2011; Haas et al., 2011). Chimeras are sequences that are cre-
sure to visible light, this reaction becomes irreversible and ren- ated in the PCR amplification process from two or more tem-
ders the PMA-bound DNA unable to act as a template for PCR plates instead of a single parent template. Exact reasons for
amplification (Rogers and Bruce, 2010). Membrane integrity, chimera formation are not well-understood, but it has been
however, is considered a conservative criterion for microbial associated with PCR conditions, fragment length, and, possibly,
viability, and other approaches related to cell activity have been sample composition.
proposed (Nocker and Camper, 2009). After the cleaning steps, the sequences are clustered into
Any sequencing method, but especially high-throughput OTUs at a predetermined similarity level, usually 97%. It has
NGS technologies, suffers from sequencing errors (Kunin et al., been demonstrated that the specific choice of the clustering
2010; Balzer et al., 2011). Although newest-generation tech- algorithm affects the output by either under- or overestimating
nologies are able to generate up to 1,000-bp-long reads, there is the diversity of the sample (Sun et al., 2011). New, improved
a trade-off in positive correlation between the read length and algorithms appear as we speak, and preclude direct comparison
the error rate. Data pre-processing after the sequencing is used with earlier results. For that, all the steps of the pipelinefrom
to minimize the inaccuracy of the output. Low-quality reads, sequence preprocessing until cleaning and clusteringshould
reads below or above the certain length cutoff, and reads with be performed at once.
ambiguous base call and homopolymers are usually filtered out Thus far, statistical data analyses have overlooked the fact
from the dataset. After these pre-processing steps, the dataset is that microbiome profiles are not representations of absolute
frequently used for community analyses, while more recent measurements (e.g., microbial counts), but are typical examples
reports indicate the need for additional cleaning steps (Kunin of compositional data, which are in the form of relative propor-
et al., 2010). PyroNoise (Quince et al., 2009) and Denoiser tions of taxa. These taxa can be either observed or missed,
(Reeder and Knight, 2010) are two examples of the tools that are depending on the sampling effort, amplification bias, and
applied to remove the sequencing noise. In this process, how- sequencing depth. An increase in the relative abundance of some
ever, valid, low-abundance sequences are also de-noised to taxa will be accompanied by a compositional decrease of other
appear more similar to reads that are found at a higher abun- taxa. This may lead to statistically significant, though spurious,
dance and are assumed to be without errors. This, in turn, may correlations without any biological dependence among the taxa
lead to clustering of the sequences into lower numbers of OTUs involved. Computational tools suitable for compositional data
and to underestimation of sample diversity. analyses should be developed and implemented.

Downloaded from at Sichuan University on September 17, 2012 For personal use only. No other uses without permission.

2012 International & American Associations for Dental Research

84 Zaura Adv Dent Res 24(2) 2012

Direct Shotgun Sequencing of Collective Balzer S, Malde K, Jonassen I (2011). Systematic exploration of error
sources in pyrosequencing flowgram data. Bioinformatics 27:i304-i309.
Genome of the Microbiome (Metagenome) Belda-Ferre P, Alcaraz LD, Cabrera-Rubio R, Romero H, Simon-Soro A,
Mira A, et al. (2011). The oral metagenome in health and disease. ISME
Amplicon sequencing (sequencing of a targeted fragment of 16S J 6:46-56.
rRNA gene) provides information on only the microbial taxa Brandt BW, Bonder MJ, Huse SM, Zaura E (2012).TaxMan: a server to trim
(the players) in the community. With NGS technologies, full rRNA reference databases. Nucl Acids Res [Epub ahead of print
genomic DNA can be sequenced without the targeting step and 5/22/2012] (in press).
Cole JR, Wang Q, Cardenas E, Fish J, Chai B, Farris RJ, et al. (2009). The
without PCR bias of the amplicon sequencing (Fig., B). Instead Ribosomal Database Project: improved alignments and new tools for
of sequencing a single isolated clone (as with the traditional rRNA analysis. Nucleic Acids Res 37:D141-D145.
cloning and sequencing approach), the genome of the entire Crielaard W, Zaura E, Schuller A, Huse S, Montijn R, Keijser BJ, et al.
community (metagenome) is sequenced. For this, extracted (2011). Exploring the oral microbiota of children at various develop-
DNA is sheared into random fragments and sequenced directly. mental stages of their dentition in the relation to their oral health. BMC
Med Genomics 4:22.
Contaminations, e.g., human DNA, should be removed either Dewhirst FE, Chen T, Izard J, Paster BJ, Tanner AC, Yu WH, et al. (2010).
prior to (Hunter et al., 2011) or after the sequencing step by The human oral microbiome. J Bacteriol 192:5002-5017.
filtering the data. Then, exhaustive bioinformatics steps follow, Edgar RC, Haas BJ, Clemente JC, Quince C, Knight R (2011). UCHIME
where DNA fragments need to be assembled into genomes improves sensitivity and speed of chimera detection. Bioinformatics
against a reference genome database, followed by functional
Haas BJ, Gevers D, Earl AM, Feldgarden M, Ward DV, Knight R, et al.
gene annotation (Mitra et al., 2011; Simon and Daniel, 2011). (2011). Chimeric 16S rRNA sequence formation and detection in
The major difficulty in the approach is genome assembly while Sanger and 454-pyrosequenced PCR amplicons. Genome Res 21:494-
information is incomplete. First, the sampling is incomplete, and 504.
most genomes are partially sequenced, if at all. Sequences Henry CS, Overbeek R, Xia F, Best A, Glass E, Gilbert J, et al. (2011).
Connecting genotype to phenotype in the era of high-throughput
originating from predominant micro-organisms will dominate sequencing. Biochim Biophys Acta 1810:967-977.
the data, while less abundant species might be missed if Hong S, Bunge J, Leslin C, Jeon S, Epstein SS (2009). Polymerase chain
sequencing depth is not sufficient. Second, the information on reaction primers miss half of rRNA microbial diversity. ISME J 3:1365-
individual genomes (reference genomes) is often incomplete, 1373.
lacks accuracy, and has inconsistent annotations, and may pre- Hunter SJ, Easton S, Booth V, Henderson B, Wade WG, Ward JM, et al.
(2011). Selective removal of human DNA from metagenomic DNA
clude the mapping of individual reads to the species of origin. samples extracted from dental plaque. J Basic Microbiol 51:442-446.
Nevertheless, impressive insights have already been obtained in Keijser BJ, Zaura E, Huse SM, van der Vossen JM, Schuren FH, ten Cate
research on the gastrointestinal tract (Arumugam et al., 2011). JM, et al. (2008). Pyrosequencing analysis of the oral microflora of
The stage in the oral field has just been set (Xie et al., 2010; healthy adults. J Dent Res 87:1016-1020.
Belda-Ferre et al., 2011). Kuczynski J, Costello EK, Nemergut DR, Zaneveld J, Lauber CL, Knights
D, et al. (2010). Direct sequencing of the human microbiome readily
reveals community differences. Genome Biol 11:210.
Toward the Interactome Kunin V, Engelbrektson A, Ochman H, Hugenholtz P (2010). Wrinkles in
the rare biosphere: pyrosequencing errors can lead to artificial inflation
Future developments should bring us toward translating the of diversity estimates. Environ Microbiol 12:118-123.
Loozen G, Boon N, Pauwels M, Quirynen M, Teughels W (2011). Live/dead
genotype into phenotype (Henry et al., 2011). The question that
real-time polymerase chain reaction to assess new therapies against
we should answer is What are they doing? instead of just dental plaque-related pathologies. Mol Oral Microbiol 26:253-261.
wondering Who is there? The expertise of researchers will Lozupone C, Knight R (2005). UniFrac: a new phylogenetic method for
need to go beyond the field of microbiology and cariology, and comparing microbial communities. Appl Environ Microbiol 71:8228-
will have to apply systems ecology principles. A complex and 8235.
Lozupone CA, Hamady M, Kelley ST, Knight R (2007). Quantitative and
integrated systems biology and ecology approach should bring qualitative diversity measures lead to different insights into factors
us closer to understanding the underlying forces that facilitate that structure microbial communities. Appl Environ Microbiol 73:1576-
the stability (or imbalance) of the microbiome. The integration 1585.
of bacterial, viral, and fungal meta-omes such as the meta- Mitra S, Rupek P, Richter D, Urich T, Gilbert J, Meyer F, et al. (2011).
transcriptome, meta-proteome, and meta-metabolome, together Functional analysis of metagenomes and metatranscriptomes using
SEED and KEGG. BMC Bioinformatics 12(Suppl 1):S21.
with the host as a major co-factor, should be the ultimate goal in Nadkarni MA, Martin FE, Hunter N, Jacques NA (2009). Methods for opti-
unraveling the complexity of the oral interactome. mizing DNA extraction before quantifying oral bacterial numbers by
real-time PCR. FEMS Microbiol Lett 296:45-51.
Nocker A, Camper AK (2009). Novel approaches toward preferential detec-
Acknowledgments tion of viable cells using nucleic acid amplification techniques. FEMS
Microbiol Lett 291:137-142.
The author received no financial support and declares no poten- Nubel U, Garcia-Pichel F, Kuhl M, Muyzer G (1999). Quantifying microbial
tial conflicts of interest with respect to the authorship and/or diversity: morphotypes, 16S rRNA genes, and carotenoids of oxygenic
publication of this article. phototrophs in microbial mats. Appl Environ Microbiol 65:422-430.
Prosser JI (2010). Replicate or lie. Environ Microbiol 12:1806-1810.
Quince C, Lanzen A, Curtis TP, Davenport RJ, Hall N, head IM, et al.
References (2009). Accurate determination of microbial diversity from 454 pyrose-
quencing data. Nat Methods 6:639-641.
Arumugam M, Raes J, Pelletier E, Le Paslier D, Yamada T, Mende DR, Reeder J, Knight R (2010). Rapidly denoising pyrosequencing amplicon
et al. (2011). Enterotypes of the human gut microbiome. Nature reads by exploiting rank-abundance distributions. Nat Methods 7:668-
473:174-180; erratum in Nature 474:666, 2011). 669.

Downloaded from at Sichuan University on September 17, 2012 For personal use only. No other uses without permission.

2012 International & American Associations for Dental Research

Adv Dent Res 24(2) 2012 Next-generation Sequencing and Microbiome 85

Rogers GB, Bruce KD (2010). Next-generation sequencing in the analysis Sun Y, Cai Y, Huse SM, Knight R, Farmerie WG, Mai V, et al. (2011). A
of human microbiota: essential considerations for clinical application. large-scale benchmark study of existing algorithms for taxonomy-
Mol Diagn Ther 14:343-350. independent microbial community analysis. Brief Bioinform 13:107-
Simon C, Daniel R (2011). Metagenomic analyses: past and future trends. 121.
Appl Environ Microbiol 77:1153-1161. Xie G, Chain PS, Lo CC, Liu KL, Gans J, Qui F, et al. (2010). Community
Siqueira JF Jr, Fouad AF, Rocas IN (2012). Pyrosequencing as a tool for and gene composition of a human dental plaque microbiota obtained by
better understanding of human microbiomes. J Oral Microbiol [Epub metagenomic sequencing. Mol Oral Microbiol 25:391-405.
ahead of print 1/23/2012] (in press).

Downloaded from at Sichuan University on September 17, 2012 For personal use only. No other uses without permission.

2012 International & American Associations for Dental Research