You are on page 1of 5

Program name: MaxBin

Version: 2.2
Developer: Yu-Wei Wu (ywwei@lbl.gov)
Affiliation: Joint BioEnergy Institute, Lawrence Berkeley National Laboratory

===================== Quick installation guideline =============================


==============
=== 1. Download MaxBin and unzip it
=== 2. Enter src directory under MaxBin and "make" it.
=== 3. Run "./autobuild_auxiliary" at MaxBin directory to download, compile,
===
and setup the auxiliary software packages
=== 4. MaxBin should be ready to go.
===================== End of Quick installation guideline ======================
==============

=== What MaxBin can do? ===


MaxBin is a software that is capable
erent bins, each consists of contigs
e composition information and contig
through an Expectation-Maximization

of clustering metagenomic contigs into diff


from one species. MaxBin uses the nucleotid
abundance information to do achieve binning
algorithm.

=== Support platform ===


MaxBin was developed and maintained on Linux platform. It has been tested on doc
ker images of Ubuntu, Redhat, SUSE, Debian, Fedora, and Mageia. MaxBin core prog
ram can now be compiled on Mac platform; however the association of auxiliary so
ftware needs to be performed manually on Mac. Please refer to the manual way to
install auxiliary software section as described below.
=== Prerequisite ===
1. g++ (c++11 or higher), make
2. Perl5 with CPAN, LWP::Simple, and FindBin modules
(./autobuild_auxiliary will use CPAN to check and install the other two Perl
modules)
3. (for automatic downloading and installing auxiliary software) curl, unzip, wh
ich
=== Installation ===
The installation of MaxBin is two-fold. MaxBin needs some auxiliary software pac
kages to run correctly. The installation steps of MaxBin is as follows.
1. Enter MaxBin directory abd run "make" under that directory. This will build t
he MaxBin executable. The commands are:
$ cd src
$ make
$ cd ..
2. Download and compile auxiliary software packages. There are two ways:
- The easiest way is execute the script "./autobuild_auxiliary." It will atte
mpt to download all auxiliary software packages from mirror sites, compile them,
and set the runtime environment. It is highly recommended that you try this opt
ion for less hassle.

- The manual way. Auxiliary packages include Bowtie2, FragGeneScan, IDBA-UD,


and Hmmer3. Please follow the instructions in the software packages to compile a
nd install them.
* Bowtie2 (tested version: bowtie2-2.0.0-beta7)
http://bowtie-bio.sourceforge.net/bowtie2/index.shtml
* FragGeneScan (tested version: FragGeneScan 1.18)
http://omics.informatics.indiana.edu/FragGeneScan/
* Hmmer3 (tested version: HMMER 3.0.0 64 bit)
http://hmmer.janelia.org/
* IDBA-UD (tested version: IDBA-UD 1.1.1)
http://i.cs.hku.hk/~alse/hkubrg/projects/idba_ud/
* You also need to Specify the software paths. There are two different w
ays.
a. Export the paths of software executables in system path. For exampl
e, in ubuntu linux you can add a line in ~/.bashrc file:
export PATH=/usr/local/bin/FragGeneScan_1.18:/usr/local/bin/hmmer3:
/usr/local/bin/bowtie2:/local/bin/idba-1.1.1:$PATH
(This is just an example; you can specify your own software locatio
ns freely)
b. Alternatively, you can specify the software paths in the "setting"
file under MaxBin directory. The file is in the follow format:
[FragGeneScan] /local/bin/FragGeneScan1.18
[Bowtie2] /local/bin/bowtie2-2.0.0-beta7
[HMMER3] /local/bin/hmmer-3.0-linux-intel-x86_64/binaries
[IDBA-UD] /local/bin/idba-1.1.1/bin
(This is just an example. Please specify your own software location
s.)

---Plotting the number of each marker gene in each contig--MaxBin provides the functionality to plot the single copy marker genes in each b
in using R (with gplots package installed). Please install R beforehand and make
sure that R and Rscript can be executed by type the two commands in the command
line.
=== Run MaxBin ===
To run MaxBin, please type in "perl run_MaxBin.pl" or just "run_MaxBin.pl". You
will see options. Here are the options:
(required) -contig (contig filename)
(required) -out (output file header)
--- at least one of the following parameters is needed
(semi-required) -abund (contig abundance files. To be explained in Abundance ses
sion below.)
(semi-required) -reads (reads file in fasta or fastq format. To be explained in
Abundance session below.)
(semi-required) -abund_list (a list file of all contig abundance files.)
(semi-required) -reads_list (a list file of all reads file.)
--(optional) -thread (number of threads; default 1)

(optional) -reassembly (specify this option if you want to reassemble the bins.
Note that at least one reads file needs to be designated.)
(optional) -prob_threshold (minimum probability for EM algorithm; default 0.8)
(optional) -plotmarker (specify this option if you want to plot the markers in e
ach contig. Installing R is a must for this option to work.)
(optional) -verbose (as is. Warning: output log will be LOOOONG.)
(optional) -markerset (choose between 107 marker genes by default or 40 marker g
enes. see Marker Gene Note for more information.)
Example commands:
run_MaxBin.pl -contig mycontig -abund myabund -out myout
run_MaxBin.pl -contig mycontig -reads myreads -reads2 my2ndreads -reads3 my3rdre
ads -out myout
run_MaxBin.pl -contig mycontig -abund myabund -abund2 my2ndabund -abund3 my3rdab
und -reads myreads -reads2 my2ndreads -out myout -thread 4
run_MaxBin.pl -contig mycontig -abund_list abund_list_file -reads_list reads_lis
t_file -out myout -prob_threshold 0.5 -markerset 40
(Warning: Please do NOT specify abundance and reads that BELONG TO THE SAME META
GENOME together. MaxBin will treat them as two different information and thus wi
ll count this metagenome TWICE!)
=== Reassembly Note ===
Reassembly option is still highly experimental. To use this function, you need t
o feed MaxBin "interleaved paired-end" fastq or fasta file if you were to use th
is option. An example of interleaved paired-end fasta file is as follows:
>reads1.1
AAAAA
>reads1.2
CCCCC
>reads2.1
TTTTT
>reads2.2
GGGGG
>reads3.1
AAAAA
>reads3.2
TTTTT

=== Marker Gene Note ===


By default MaxBin will look for 107 marker genes present in >95% of bacteria. Al
ternatively you can also choose 40 marker gene sets that are universal among bac
teria and archaea (Wu et al., PLoS ONE 2013). This option may be better suited f
or environment dominated by archaea; however it tend to split genomes into more
bins. You can choose between different marker gene sets and see which one works
better.
=== Abundance ===
The contig abundance information can be provided in two ways: user can choose to
provide the abundance file or MaxBin will use Bowtie2 to map the sequencing rea
ds against contigs and generate the abundance information.
---if you have the abundance information--Please make sure that your abundance information is provided in the following fo
rmat (\t stands for a tab delimiter):

(contig header)\t(abundance)
For example, assume I have three contigs named A0001, A0002, and A0003, then my
abundance file will look like
A0001
A0002
A0003

30.89
20.02
78.93

---if you don't have abundance information--Don't worry, MaxBin will generate this information for you from sequencing reads
. Please specify the reads file (in fasta format) in -reads.
---Providing more than one reads or abundance files
The reads and abundance files can be provided via a file consisting of all reads
or abundance files. For example, you have 3 abundance files and 5 reads files,
and you want to provide them all. You can create two files "abund_list" and "rea
ds_list", which are
$ cat abund_list
(first abundance file)
(second abundance file)
(third abundance file)
$ cat reads_list
(first reads file)
(second reads file)
(third reads file)
(fourth reads file)
(fifth reads file)
Then you can feed all information into MaxBin using -abund_list and -reads_list
options. Handy for a large number of reads or abundance files.
===MaxBin Output===
Assume your output file header is (out). MaxBin will generate information using
this file header as follows.
(out).0XX.fasta -- the XX bin. XX are numbers, e.g. out.001.fasta
(out).summary -- a summary file describing which contigs are being classified in
to which bin.
(out).log -- a log file recording the core steps of MaxBin algorithm
(out).marker -- marker gene presence numbers for each bin. This table is ready t
o be plotted by R or other 3rd-party software.
(out).marker.pdf -- visualization of the marker gene presence numbers using R. W
ill only appear if -plotmarker is specified.
(out).noclass -- this file stores all sequences that pass the minimum length thr
eshold but are not classified successfully.
(out).tooshort -- this file stores all sequences that do not meet the minimum le
ngth threshold.
(out).marker_of_each_gene.tar.gz -- this tarball file stores all markers predict
ed from the individual bins. Use "tar -zxvf (out).marker_of_each_gene.tar.gz" to
extract the markers [(out).0XX.marker.fasta].
(if -reassembly is given)
(out)_reassem/(out).reads.0xx -- the collected reads for the 0xx bin.
(out)_reassem/(out).reads.noclass - reads that cannot be assigned to any bin.

===Bug or problems===
Encounter bugs, problems, or have any suggestions? Please contact Yu-Wei Wu (yww
ei@lbl.gov).