Skip to content

Compiling NCBI C++ Toolkit

NCBI C++ Toolkit version 12.0.0[1] is installed on Proteus. Use the module

ncbi-toolkit/gcc/64/12.0.0

There may be module dependencies -- see any warning messages that appear with "module load ncbi-toolkit".

Basic Usage

Load the Module

First, load the module which provides the toolkit:

[juser@proteusa01 ~]$ module load ncbi-toolkit/gcc/64/12.0.0

Check that it worked:

[juser@proteusa01 ~]$ which update_blastdb.pl /mnt/HA/opt/ncbi-toolkit/gcc/64/12.0.0/bin/update_blastdb.pl

DB Location

Decide on a directory where all the database files will be downloaded. If you are working in a group which will share the database, put this in your group directory. We will use ~/ncbi_db/:

[juser@proteusa01 ~]$ mkdir ncbi_db

Or use the local copy. See BLAST Databases

Set the BLASTDB Environment Variable

You can set the BLASTDB environment variable[2] in one of your .bashrc file. You can also set it manually in any job script you write.

export BLASTDB=~/ncbi_db

Download an Updated Database

We will use the nr database as an example. If the BLASTDB environment is not set, manually set it in the shell. (See above.)

See what databases are available:

[juser@proteusa01 ~]$ update_blastdb.pl --showall

Just in case the BLASTDB environment variable is not properly used by the tools, cd into it, and do the update -- this will take up to an hour:

[juser@proteusa01 ~]$ cd $BLASTDB [juser@proteusa01 ncbi_db]$ update_blastdb.pl nr ...

After it completes, check that all files were downloaded correctly by doing the checksum:

[juser@proteusa01 ncbi_db]$ md5sum -c *.md5 nr.01.tar.gz: OK nr.02.tar.gz: OK ...

Uncompress them all:

[juser@proteusa01 ncbi_db]$ for x in nr.*.tar.gz ; do tar xf $x ; done ...

This produces many files: .phr, .psd, .psq, etc.

Delete the tarballs:

[juser@proteusa01 ncbi_db]$ rm -f *.tar.gz

Retain the *.md5 files so that the update_blastdb.pl script can tell which db is up to date.

Run Multithreaded

The installation of NCBI Toolkit on Proteus does not use MPI, but it is multithreaded. That means it can use multiple processor cores on a single compute node, but will not do computations using multiple compute nodes. Most NCBI Toolkit command line tools have the option to specify the number of threads. In a job script, the NSLOTS environment variable is set in the job to be the number of slots requested. So:

#$ -pe shm 8 ... blastx -num_threads ${NSLOTS} ...

WARNING

Using the NCBI-hosted databases by using the "-remote" option will get Proteus blocked by NCBI due to overuse. This is true especially for batch jobs on the cluster.

Compiling

[juser@proteusi01 ncbi_cxx--12_0_0]$ module list Currently Loaded Modulefiles: 1) shared           2) proteus          3) gcc/4.8.1        4) sge/univa        5) hdf5_18/1.8.11 [juser@proteusi01 ncbi_cxx--12_0_0]$ ./configure LDFLAGS="-L$HDF5DIR" CPPFLAGS="-I$HDF5INCLUDE" \ --prefix=/mnt/HA/opt/ncbi_cxx/gcc/12.0.0 --with-algo --with-png --with-tiff --with-pcre \ --with-z --with-mysql --with-check --with-boost --with-xerces --with-libxslt \ --with-sge=/cm/shared/apps/sge/univa --with-xalan --with-gif --with-jpeg --with-xpm \ --with-curl --with-hdf5=${HDF5DIR} \ --with-mt --with-64 --without-debug --with-optimization --with-dll --with-runpath

========= 2014-08-28 module list 1) shared                                   4) sge/univa                                7) proteus-fftw3/gcc/64/3.3.3              10) boost/openmpi/gcc/64/1.56.0 2) proteus                                  5) proteus-blas/gcc/64/20110419             8) python/2.7.8                            11) hdf5_18/1.8.11 3) gcc/4.8.1                                6) proteus-lapack/gcc/64/3.5.0              9) proteus-openmpi/gcc/64/1.8.1-mlnx-ofed export CFLAGS="-O3 -mavx -msse4.2 -mfpmath=sse" export CXXFLAGS="${CFLAGS}" export NCBIPREFIX="/mnt/HA/opt/ncbi-toolkit/gcc/64/12.0.0 ./configure --prefix=${NCBIPREFIX} --with-mt --with-64 --with-lfs --with-check \ --with-bin-release --with-strip --with-sge=$SGE_ROOT --with-3psw=std:netopt \ --with-app --with-boost=$BOOSTDIR  --with-optimization --without-debug \ --with-dll

Build happens in directory GCC481-ReleaseMTDLL64.

See Also

References

[1] NCBI C++ Toolkit web site

[2] NCBI Blast Help - Configuration