You are on page 1of 11

Overview of available metagenomic

analysis tools

Oksana Lukjancenko
Outline

• General overview of the methods


• Classification methods
• Commonly used sequence search algorithms

2
Metagenomic analysis methods

Reads

Assembly Classification methods

Sequence
Sequence Marker -
Binning Annotation similarity-
composition-based based
based

Hybrid
3
Sequence similarity-based methods

A homology search (comparison) against


the database of reference organisms

Disadvantage: Can’t not identify organisms


that are not present in the reference database

4
Sequence composition-based methods

• Based on characteristics of the nucleotide


composition (e.g. GC% or codon usage)

• Find the best fitting model to each sequence


read

Disadvantage: Short reads (<1000 bp) are


not suited for this method
5
Hybrid methods

Hybrid methods combine the elements of


both similarity-based and composition-
based methods

6
Marker-based methods

Compare each metagenomic read to the


curated collection of marker genes to identify
high-confidence matches.

Disadvantage: Achieve a low-level of sensitivity


if the reads don’t come from genomes represented
by the marker gene database.

7
Functional Analysis

Mapping against marker genes:


• Antimicrobial Resistance (AMR) genes
• Virulence Factors
• Transposones
• Enzymes
• etc.

8
Commonly used sequence search algorithms

• Variations of BLAST (blastn, blastx, MEGAblast) –


finds regions of similarity between biological
sequences.
• Hidden Markov Models (HMMER) – searches
sequence profile (model) databases for sequence
homologs.
• Bowtie/Bowtie2 – read alignment
to the long reference sequences.
9
Commonly used sequence search algorithms

• Burrows-Wheeler Aligner (BWA) – mapping


of low-divergent sequences against a large
reference genome

• k-mers - search against database of


substrings of length k that are contained in a
string.
10
Commonly used tools

Method name Class of method Sequence search method Composition method Functional classification
MEGAN4 Similarity BLAST programs N/A KEGG, SEED
MG-RAST Similarity BLASTN, BLAT N/A SEED, NOG, COG, KEGG
CARMA3 Similarity BLAST programs N/A Pfam, COG, GO, TIGRFAM
Kraken Similarity Exact match k-mers N/A N/A
MGmapper Similarity BWA N/A N/A
MLTreeMap Marker BLASTX N/A 4 Enzyme families
AMPHORA2 Marker HMMER3 N/A N/A
MetaPhlAn Marker MEGABLAST, Bowtie2 N/A N/A
phymmBL Hybrid MEGABLAST IMM N/A
RITA Hybrid Pipeline of BLAST variations NB N/A
PhyloPythiaS Composition N/A SVM N/A
TACOA Composition N/A k-NN N/A

Peabody et. al.


11

You might also like