You are on page 1of 1

Harvard T.H.

Chan School of Public Health


Microbiome Analysis Core
1 1 1,2 1 1,2
Jeremy E. Wilkinson , Lauren J. McIver , Kelsey N. Thompson , Chengchen Li , Curtis Huttenhower
1
Department of Biostatistics, Harvard T.H. Chan School of Public Health 2Broad Institute of MIT and Harvard

The Microbiome Analysis Core at the Harvard T.H. Chan School of Public
Microbial multi'omics Microbial community profiling
Health was established in response to the rapidly emerging field of (a) Sampling and multi’omic profiling systems in microbiome epidemiology Metadata The first step in microbiome molecular data analysis is quality control
microbiome research and its potential to affect studies across the
Metagenomics Metatranscriptomics Metaproteomics Metametabolomics
• Host genetics (KneadData) and profiling to transform raw data into biologically interpretable
features using a reproducible workflow (AnADAMA2). This includes identifying
• Host epigenetics

Measurements
biomedical sciences. The Core’s goal is to aid researchers with • Host gene expression
• Host immune profile

microbiome study design and interpretation, reducing the gap between • Host clinical information
• Host demographics
microbial species (MetaPhlAn2) and strains (PanPhlAn/StrainPhlAn),
primary data and translatable biology. The Microbiome Analysis Core (Microbial species, strains,
and gene products)
(Microbial gene
expression profiles)
(Microbial protein
profiles)
(Microbial metabolite
concentrations)
characterizing their functional potential or activity (HUMAnN2, ShortBRED),
provides end-to-end support for microbial community and human
(Multivariate covariates)
and integrating metagenomics with other data types (PICRUSt, MelonnPan),
microbiome research, from experimental design through data generation,
(b) Multi’omic data Integration (c)
Before
Normalization

After
(d) Quantitative analysis
among others.
bioinformatics, and statistics. This includes general consulting, power

Abundance
Feature 1
Cases Controls Feature 2
DNA RNA Proteins/Metabolites
calculations, selection of data generation options, and analysis of data
• Microbial species abundance Feature 3
• Microbial strain profiles Feature 4
• Microbial gene products Subjects
from amplicon (16S/18S/ITS), shotgun metagenomic sequencing,

Abundance
• Microbial gene expression profiles ● ● ● ●● ● ●
● ● ●●●●●● ●●

Component 2
• Microbial protein profiles ●●●●●●●● ● ●
●●●
metatranscriptomics, metabolomics, and other molecular assays. The

● ● ●● ●●●● ●●●●
●●●●
• Microbial metabolite concentrations ●



●●●●●●

●●●●●





●●


●●


●●●●


●●●●●

●●
●● ●●
●●●
● ●
●●●●● ● ● ● ● ●●●●●●
● ●● ●● ●
● ●
●● ●●●
●●●●
● ●●●
●●
● ●
●●●
● ● ●●●
●●● ● ● ● ●● ● ●● ● ●
● ●
• Multivariate covariates ● ●●
● ●

● ●
●●●●●●●●● ●●
● ●

●●●●●●●●●●●●
Microbiome Analysis Core has extensive experience with microbiome Integrated multi’omic profiles


● ●●
●● ●

Cases
Controls
Positive Negative

profiles in diverse populations, including taxonomic and functional profiles Component 1


Disease association
Disease Association

from large cohorts, qualitative ecology, multi'omics and meta-analysis, and (e) Meta analysis of multi’omic datasets Synthesizing evidence
from multiple studies
(f) Experimental validation

microbial systems and human epidemiological analysis. By integrating New


case/control
Literature
case/control
Literature
case/control
Literature
reference
Prioritized
microbial
features
Gnotobiotic
mice in vi vo
Phenotypic
readouts

microbial community profiles with host clinical and environmental profiles dataset 1 dataset 2 dataset Study 1

Study reference
Study 2 ‘omics
+
information, we enable researchers to interpret molecular activities of the Study 3 profiles
Study 4

Abundance
microbiota and assess its impact on human health.
Study 5
in vitro
Pooled
‘omics
profiles +
= Same feature measured in multiple cohorts
Mammalian Phenotypic
cell culture readouts

Core services The Harvard Chan Microbiome Analysis Core supports microbiome analysis for a
variety of molecular data types in human populations or in model systems. Typical
Consultation for microbiome analysis workflow steps include a) molecular data generation of a variety of types,
project development.
This includes consultation on
including but not limited to sequencing, which are b) bioinformatically processed
into biologically interpretable features and c) quality controlled per dataset. This Downstream analysis and statistics
experimental design, sample permits d) microbiome-tailored statistical methods to associate molecular features Once profiled, microbial communities are amenable to downstream statistics
collection and sequencing, grant with covariates and outcomes, and optionally e) meta-analysis of multiple data and visualization much like other molecular epidemiology such as human
proposal development, study power types per project or across multiple projects. Finally, f) the Core can assist with genetic or transcriptional profiles. Like these other data types, microbial
study design for downstream evaluation of statistical associations in in vivo or in communities often require tailored statistics for environmental, exposure, or
estimation, bioinformatics, and vitro model systems.
statistical data analysis. phenotype association (LEfSe, MaAsLin) or for ecological interaction discovery
(BAnOCC). The Harvard Chan Microbiome Analysis Core also provides a
(b) Reference-based strain profiling (c) Assembly-based strain profiling

Map sample reads to reference pangenome. Score genes Assemble sample reads into contigs, call genes (ORFs), and annotate against

variety of tools for bioinformaticians working in the microbiome space.


as confidently detected (well covered) or confidently absent. reference databases to reveal variation in gene content and order (synteny).

Validated end-to-end meta’omic (a)


Strain-level features
Call and compare single-nucleotide variants (SNVs) in
conserved marker genes.
SNV-level variation can be directly inferred from assembled haplotypes.

analysis of microbial community


Sequencing fragment
distinguish cases & controls Species X reference pangenome
within a common microbial SNV calls
1 2 3 4 5
data.
species (species X) Contig Gene call
Haplotype calls
CATGATTA
Not

Using open-source analytical methods


1 6 3 4 5 AGTATCGGACAA
seen A
A

developed in the Huttenhower


1 5 3 4 2 TGTATCGGACTA
A
Two controls
A

laboratory and by other leaders in the


3 7 1 8 5 AGCATCCCACAA
C
C

field, we provide cutting-edge


3 5 1 8 2 TGCATCCCACTA
C
Two cases C

microbiome informatics and analysis. Gene recruiting


sample reads
Gene-level strain variation
(phenotype-associated)
SNV-level
strain variation
Syntenic variation Novel gene diversity
(Phenotype-associated)
SNV-level strain variation
(phenotype-associated)

(d) (e)
Support fully-collaborative grant-
multiomic Mechanisms of strain-level
Metagenomic Metatranscriptomic Relative expression relative expression transcriptional variation
abundances (DNA) abundances (RNA) (RNA/DNA ratio) analysis

funded investigations.
Reference

5 controls 5 cases 5 controls 5 cases 5 controls 5 cases


Microbiome features (e.g. taxa, genes, pathways)

1 2 3

Includes preliminary data A globally RNA-abundant feature


explained by high DNA copy number

development, hypothesis formulation, A globally under-expressed feature


Case 1

(DNA abundance exceeds RNA abundance)


1 2 3

grant narrative development, data A globally over-expressed feature


(RNA abundance exceeds DNA abundance) Gene 2 over-expressed
(up-regulated)

analysis and inference, custom


A feature that is depleted in case RNA
Case 2

due to reduced case DNA copies


1 2 3

software development, and co-


A feature that is depleted in case RNA
due to under-expression in cases Gene 2 under-expressed
(down-regulated)
A feature that is enriched in case RNA

authored dissemination of findings. due to over-expression in cases


Case 3

1 3
Gene 2 under-expressed McIver, L. J. et al. bioBakery: a meta’omic analysis environment. Bioinformatics, 34:7, 1235-1237 (2018).
Low High Low High DNA RNA (not encoded by strain)
(=)
DNA DNA RNA RNA favored favored

Shotgun metagenomic and metatranscriptomic sequence data are particularly Director: Jeremy E. Wilkinson
amenable to detailed computational analysis, including multiple complementary Senior Software Developer: Lauren J. McIver
methods for a) strain tracking or differential microbial expression. b) Reference- Postdoctoral Fellow and Data Analyst: Kelsey N. Thompson
based methods can identify strains using either single nucleotide or structural Research Project Manager and Data Analyst: Chengchen (Cherry) Li
(genomic) variants, and c) can be used in tandem with assembly-based methods Scientific Director: Curtis Huttenhower
The Harvard Chan Microbiome Analysis Core for novel microbial discovery. d) Whole-community microbial differential
expression can additionally be detected either in tandem with or in addition to
is a part of the Harvard Chan Microbiome in metagenomic copy number changes, and e) analyzed per gene, pathway, https://hcmph.sph.harvard.edu/hcmac
Public Health Center (HCMPH). Want to learn microbe, or human individual.
more? Visit https://hcmph.sph.harvard.edu Mallick, H. et al. Experimental design and quantitative analysis of microbial community multiomics. Genome Biology. 18:228 (2017). http://huttenhower.sph.harvard.edu

You might also like