You are on page 1of 3

4/19/2018 GitHub - enormandeau/ncbi_blast_tutorial: Short introduction to using NCBI blast tools from the command line

Please note that GitHub no longer supports your web browser.


Learn more Ignore
We recommend upgrading to the latest Google Chrome or Firefox.

enormandeau / ncbi_blast_tutorial

Dismiss
Join GitHub today
GitHub is home to over 20 million developers working together to host and
review code, manage projects, and build software together.

Sign up

Short introduction to using NCBI blast tools from the command line
# tutorial # ncbi-blast

7 commits 1 branch 0 releases Fetching contributors

Branch: master New pull request Find file Clone or download

Fetching latest commit…

.gitignore First commit Apr 21, 2016

README.md Add -outfmt options Apr 21, 2016

reference.fasta First commit Apr 21, 2016

sequences.fasta First commit Apr 21, 2016

README.md

NCBI blast tutorial


Short introduction to using NCBI blast tools from the command line

Using Blast from the command line


Sometimes, you may have to use blast on your own computer to query thousands of sequences against a custom database of
hundreds of thousands of sequences. To do that, you will need to install Blast on your computer, format the database and then
blast the sequences.

Here is a short tutorial on how to do this.

Installing Blast+ tools


Get the compiled executables from this URL:

ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/

Decompress the archive. For example:

tar xvfz ncbi-blast-2.3.0+-x64-linux.tar.gz

https://github.com/enormandeau/ncbi_blast_tutorial 1/3
4/19/2018 GitHub - enormandeau/ncbi_blast_tutorial: Short introduction to using NCBI blast tools from the command line

Add the bin folder from the extracted archive to your path. For example, add the following line to your ~/.bashrc file:

export PATH="/PATH/TO/ncbi-blast-2.3.0+/bin":$PATH

And change the /PATH/TO part to the path where you have put the extracted archive.

Example sequences to use with the tutorial


In order to test blast, you need a test fasta file. Use the following files that come with the tutorial:

sequences.fasta

reference.fasta

Create blast database


The different blast tools require a formatted database to search against. In order to create the database, we use the
makeblastdb tool:

makeblastdb -in reference.fasta -title reference -dbtype nucl -out databases/reference

This will create a list of files in the databases folder. These are all part of the blast database.

Blast

We can now blast our sequences against the database. In this case, both our query sequences and database sequences are DNA
sequences, so we use the blastn tool:

blastn -db databases/reference -query sequences.fasta -evalue 1e-3 -word_size 11 -outfmt 0 >
sequences.reference

You can use different output formats with the outmft option:

-outfmt <String>
alignment view options:
0 = pairwise,
1 = query-anchored showing identities,
2 = query-anchored no identities,
3 = flat query-anchored, show identities,
4 = flat query-anchored, no identities,
5 = XML Blast output,
6 = tabular,
7 = tabular with comment lines,
8 = Text ASN.1,
9 = Binary ASN.1,
10 = Comma-separated values,
11 = BLAST archive format (ASN.1)

Blast with parallel


If you need to run your blasts faster (and who doesn't?), you can maximise your blast speed using gnu parallel. You will find
it at this link.

Download the archive, extract it (with tar xvfB parallel-latest.tar.bz2) and install it with the following commands:

./configure

https://github.com/enormandeau/ncbi_blast_tutorial 2/3
4/19/2018 GitHub - enormandeau/ncbi_blast_tutorial: Short introduction to using NCBI blast tools from the command line
make
sudo make install

We can now use parallel to speed up blast:

time cat sequences.fasta | parallel -k --block 1k --recstart '>' --pipe 'blastn -db databases/reference -
query - -evalue 1e-3 -word_size 11 -outfmt 0' > sequences.reference

More options and getting help


If you need help to know the options and parameters you can pass blastn and the other blast+ utilities, use the --help
option and pipe the output into less, for example:

blastn --help | less

NCBI blast tools cover more cases than DNA against DNA searches. For example, you can search a protein database with either
DNA or protein sequences. Here is an exhaustive list of the programs that come with the blast+ distribution:

blastdb_aliastool
blastdbcheck
blastdbcmd
blast_formatter
blastn
blastp
blastx
convert2blastmask
deltablast
dustmasker
legacy_blast.pl
makeblastdb
makembindex
makeprofiledb
psiblast
rpsblast
rpstblastn
segmasker
tblastn
tblastx
update_blastdb.pl
windowmasker

References
O. Tange (2011): GNU Parallel - The Command-Line Power Tool, ;login: The USENIX Magazine, February 2011:42-47.

Licence

NCBI blast tutorial by Eric Normandeau is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at https://github.com/enormandeau/ncbi_blast_tutorial.

https://github.com/enormandeau/ncbi_blast_tutorial 3/3

You might also like