Job Script Example 04 Ray assembler
This is an example of using the Ray de novo assembler. This duplicates one of the examples from the Laboratory of Genomics, Evolution, and Development at Michigan State University.[1]
This example plus input files are in:
/mnt/HA/opt/Examples/Ray
Please see the README file in that directory.
Obtaining Input Files and Renaming Them
The input files are provided by the author. However, they need to be uncompressed and renamed.
Download Input Files
Download the single tar file, and expand:
[juser@proteusa01 ray-example]$ wget
https://s3.amazonaws.com/public.ged.msu.edu/hmp-mock-subsets.tar
[juser@proteusa01 ray-example]$ tar xf hmp-mock-subsets.tar
This creates a subdirectory named "hmp-mock-subsets
" containing
gzipped FASTA files named with the pattern "*.fa.gz
". All the commands
in the webinar/tutorial page refer to files named "*.fasta
", so the
files have to be uncompressed, and then renamed.
First, uncompress:
[juser@proteusa01 ray-exaple]$ cd hmp-mock-subsets
[juser@proteusa01 hmp-mock-subsets]$ gunzip *.gz
Do an ls to see the files. The following snippet will rename the files if your shell is bash. You are on your own for a csh derivative.
[juser@proteusa01 hmp-mock-subsets]$ for x in *.fa ; do y=`echo $x | sed -e 's/fa$/fasta/'` ; mv $x $y ; done
Test Job
Go back to the top-level directory:
[juser@proteusa01 hmp-mock-subsets]$ cd ..
[juser@proteusa01 ray-example]$
Create the job script, named "raytest.sh
".
#!/bin/bash
#$ -S /bin/bash
#$ -P myrsrchPrj
#$ -M myname@drexel.edu
#$ -j y
#$ -cwd
#$ -pe shm 64
#$ -l vendor=amd
#$ -l h_rt=4:00:00
#$ -l h_vmem=4g
#$ -l m_mem_free=3g
. /etc/profile.d/modules.sh
module load shared
module load gcc
module load sge/univa
module load proteus
module load proteus-openmpi/gcc/64/1.8.1-mlnx-ofed
### leave off "-n NNN" since OpenMPI can infer number of available processors from the job environment
### k is the k-mer length
### this command creates a directory p4.ray.31 and puts all output there
${MPI_RUN} Ray -k 31 -s hmp-mock-subsets/partition4.orig.fasta -o p4.ray.31
NOTE This runs much faster on a single node (i.e. request "-pe shm 64") than on 2 nodes (i.e. "-pe openmpi_ib 128"). Run time on 64 slots single node was about 73 seconds. Run time on 128 slots 2 nodes was > 30 minutes -- the job was killed when the h_rt limit of 30 minutes was reached.