Skip to content

Job Script Example 04 Ray assembler

This is an example of using the Ray de novo assembler. This duplicates one of the examples from the Laboratory of Genomics, Evolution, and Development at Michigan State University.[1]

This example plus input files are in:

/mnt/HA/opt/Examples/Ray

Please see the README file in that directory.

Obtaining Input Files and Renaming Them

The input files are provided by the author. However, they need to be uncompressed and renamed.

Download Input Files

Download the single tar file, and expand:

[juser@proteusa01 ray-example]$ wgethttps://s3.amazonaws.com/public.ged.msu.edu/hmp-mock-subsets.tar [juser@proteusa01 ray-example]$ tar xf hmp-mock-subsets.tar

This creates a subdirectory named "hmp-mock-subsets" containing gzipped FASTA files named with the pattern "*.fa.gz". All the commands in the webinar/tutorial page refer to files named "*.fasta", so the files have to be uncompressed, and then renamed.

First, uncompress:

[juser@proteusa01 ray-exaple]$ cd hmp-mock-subsets [juser@proteusa01 hmp-mock-subsets]$ gunzip *.gz

Do an ls to see the files. The following snippet will rename the files if your shell is bash. You are on your own for a csh derivative.

[juser@proteusa01 hmp-mock-subsets]$ for x in *.fa ; do y=`echo $x | sed -e 's/fa$/fasta/'` ; mv $x $y ; done

Test Job

Go back to the top-level directory:

[juser@proteusa01 hmp-mock-subsets]$ cd .. [juser@proteusa01 ray-example]$

Create the job script, named "raytest.sh".

#!/bin/bash
#$ -S /bin/bash
#$ -P myrsrchPrj
#$ -M myname@drexel.edu
#$ -j y
#$ -cwd
#$ -pe shm 64
#$ -l vendor=amd
#$ -l h_rt=4:00:00
#$ -l h_vmem=4g
#$ -l m_mem_free=3g

. /etc/profile.d/modules.sh
module load shared
module load gcc
module load sge/univa
module load proteus
module load proteus-openmpi/gcc/64/1.8.1-mlnx-ofed

### leave off "-n NNN" since OpenMPI can infer number of available processors from the job environment
### k is the k-mer length
### this command creates a directory p4.ray.31 and puts all output there
${MPI_RUN} Ray -k 31 -s hmp-mock-subsets/partition4.orig.fasta -o p4.ray.31

NOTE This runs much faster on a single node (i.e. request "-pe shm 64") than on 2 nodes (i.e. "-pe openmpi_ib 128"). Run time on 64 slots single node was about 73 seconds. Run time on 128 slots 2 nodes was > 30 minutes -- the job was killed when the h_rt limit of 30 minutes was reached.

References

[1] GED@MSU - HMP Assembly Webinar 2013