Skip to content

VSEARCH

VSEARCH is an open and free 64-bit multithreaded tool for processing metagenomic sequences, including searching, clustering, chimera detection, dereplication, sorting, masking and shuffling.[1] It is an alterntive to USEARCH by Robert C. Edgar.[2]

Installed Versions

VSEARCH 1.1.3, 1.10.0, and 2.7.1 are installed on Proteus. Use one of the modules:

vsearch/gcc/1.1.3 vsearch/gcc/1.10.0 vsearch/gcc/2.7.1

This also provides a man page. To view (after loading the above modulefile):

[juser@proteusa01 ~]$ man vsearch

The full documentation plus tests is available in $VSEARCHDIR.

Using VSEARCH with Multiple Threads

When submitting a job, request the "shm" resource with the number of slots equal to the number of threads required:

#$ -pe shm 16

And then, when running vsearch, specify the number of threads (otherwise it will try to use all CPU cores):

vsearch --threads $NSLOTS ...

Compiling 1.1.3

They provide three separate makefiles for three separate versions. One "plain" (Makefile), one which can read and write gzipped data (Makefile.ZLIB), and one which can read and write BZ2-compresse data (Makefile.BZLIB). This article will only deal with the "plain" version, for now.

Download a tarball and expand it. Note that the distribution tarball includes executables of many other versions for many other machines. You can delete them.

Edit the Makefile and Compile

Modify src/Makefile according to this diff:

--- Makefile.orig       2015-03-18 05:58:00.000000000 -0400
+++ Makefile    2015-08-10 22:42:28.962047475 -0400
@@ -20,14 +20,16 @@
 # Profiling
 #PROFILING=-g -fprofile-arcs -ftest-coverage
 #PROFILING=-g -pg
-PROFILING=-g
+#PROFILING=-g
+PROFILING=

 # Compiler warnings
 WARN=-Wall -Wsign-compare
 #WARN=-Weverything

 CXX=g++
-CXXFLAGS=-O3 -msse2 -mtune=core2 -Icityhash $(WARN) $(PROFILING)
+#CXXFLAGS=-O3 -msse2 -mtune=core2 -Icityhash $(WARN) $(PROFILING)
+CXXFLAGS=-O3 -march=corei7-avx -Icityhash $(WARN) $(PROFILING)
 LINKFLAGS=$(PROFILING)
 LIBS=-lpthread

@@ -64,4 +66,4 @@
        $(CXX) $(CXXFLAGS) -mssse3 -DSSSE3 -c -o $@ $<

 cpu_sse2.o : cpu.cc $(DEPS)
-       $(CXX) $(CXXFLAGS) -c -o $@ $<
+       $(CXX) $(CXXFLAGS) -msse2 -c -o $@ $<

And compile it:

[juser@proteusa01 vsearch-1.1.3]$ make -j 8 | tee Make.out

This creates the executable named vsearch.

Test

This test runs the just-built vsearch in src/. The results of the run using the version installed on Proteus is included below:

[juser@proteusa01 eval]$ ./eval.sh v
Creating random test set

Running search
vsearch v1.1.3_linux_x86_64, 63.0GB RAM, 16 cores
https://github.com/torognes/vsearch

Reading file ./db.fsa 100%
52576347 nt in 380472 seqs, min 32, max 1875, avg 138
WARNING: 447 sequences shorter than 32 nucleotides discarded.
Indexing sequences 100%
Masking 100%
Counting unique k-mers 100%
Creating index of unique k-mers 100%
Searching 100%
Matching query sequences: 208000 of 208500 (99.76%)
318.97user 0.68system 0:24.15elapsed 1323%CPU (0avgtext+0avgdata 715288maxresident)k
0inputs+37864outputs (0major+62379minor)pagefaults 0swaps

Results
Total queries:               208500
True positives:              193500
False positives:              14500
False negatives:              15000

Recall:                          92.81%
Precision:                       93.03%
False negative rate:              7.19%
False discovery rate:             6.97%
F-score:                         92.92%

Matched (id>=70%;cov>=90%):  154700
Matched percentage:              74.20%

Install

Copy the executable src/vsearch to any convenient location in your PATH.

Compiling 1.10.0

Version 1.10.0 has a configure script. Despite what the configure options imply, the resulting executable is not linked with bzip2 or zlib.

Configure

./configure CXXFLAGS="-O3 -march=corei7-avx -mtune=corei7-avx" CFLAGS="-O3 -march=corei7-avx -mtune=corei7-avx" --prefix=/mnt/HA/opt/vsearch/gcc/1.10.0 --enable-bzip2 --enable-zlib

Compiling 2.7.1

Please see the official instructions. Despite the instructions, sudo/root access is not required: you just have to set the prefix to a non-privileged location at the configure step.

Pre-configure

module load autoconf automake ./autogen.sh

Configure

./configure CXXFLAGS="-O3" CFLAGS="-O3" --prefix=/mnt/HA/groups/myrsrchGrp --enable-bzip2 --enable-zlib

Install

make -j 12 install

And set your PATH, and MANPATH appropriately.

References

[1] VSEARCH GitHub repository

[2] USEARCH officical website