Professional Documents
Culture Documents
enormandeau / ncbi_blast_tutorial
Dismiss
Join GitHub today
GitHub is home to over 20 million developers working together to host and
review code, manage projects, and build software together.
Sign up
Short introduction to using NCBI blast tools from the command line
# tutorial # ncbi-blast
README.md
ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
https://github.com/enormandeau/ncbi_blast_tutorial 1/3
4/19/2018 GitHub - enormandeau/ncbi_blast_tutorial: Short introduction to using NCBI blast tools from the command line
Add the bin folder from the extracted archive to your path. For example, add the following line to your ~/.bashrc file:
export PATH="/PATH/TO/ncbi-blast-2.3.0+/bin":$PATH
And change the /PATH/TO part to the path where you have put the extracted archive.
sequences.fasta
reference.fasta
This will create a list of files in the databases folder. These are all part of the blast database.
Blast
We can now blast our sequences against the database. In this case, both our query sequences and database sequences are DNA
sequences, so we use the blastn tool:
blastn -db databases/reference -query sequences.fasta -evalue 1e-3 -word_size 11 -outfmt 0 >
sequences.reference
You can use different output formats with the outmft option:
-outfmt <String>
alignment view options:
0 = pairwise,
1 = query-anchored showing identities,
2 = query-anchored no identities,
3 = flat query-anchored, show identities,
4 = flat query-anchored, no identities,
5 = XML Blast output,
6 = tabular,
7 = tabular with comment lines,
8 = Text ASN.1,
9 = Binary ASN.1,
10 = Comma-separated values,
11 = BLAST archive format (ASN.1)
Download the archive, extract it (with tar xvfB parallel-latest.tar.bz2) and install it with the following commands:
./configure
https://github.com/enormandeau/ncbi_blast_tutorial 2/3
4/19/2018 GitHub - enormandeau/ncbi_blast_tutorial: Short introduction to using NCBI blast tools from the command line
make
sudo make install
time cat sequences.fasta | parallel -k --block 1k --recstart '>' --pipe 'blastn -db databases/reference -
query - -evalue 1e-3 -word_size 11 -outfmt 0' > sequences.reference
NCBI blast tools cover more cases than DNA against DNA searches. For example, you can search a protein database with either
DNA or protein sequences. Here is an exhaustive list of the programs that come with the blast+ distribution:
blastdb_aliastool
blastdbcheck
blastdbcmd
blast_formatter
blastn
blastp
blastx
convert2blastmask
deltablast
dustmasker
legacy_blast.pl
makeblastdb
makembindex
makeprofiledb
psiblast
rpsblast
rpstblastn
segmasker
tblastn
tblastx
update_blastdb.pl
windowmasker
References
O. Tange (2011): GNU Parallel - The Command-Line Power Tool, ;login: The USENIX Magazine, February 2011:42-47.
Licence
NCBI blast tutorial by Eric Normandeau is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at https://github.com/enormandeau/ncbi_blast_tutorial.
https://github.com/enormandeau/ncbi_blast_tutorial 3/3