Job Script Example 05 pyRAD
This is an example of using PyRAD. The example may be found in:
/mnt/HA/opt/Examples/Example05
Please see the README file in that directory.
Introduction
This is a translation of the author's tutorial.[1] The original tutorial is presented in a Jupyter[2] notebook, which merely runs a lot of standard Linux commands. We extract them into typed-out commands, and a job script.
This translation will not include detailed commentary.
Load the necessary modulefiles:
[juser@proteusa01 pyrad-tut]$ module load python/2.7-current vsearch muscle pyrad
Download Sample Data
Download the example data set, and expand it.
[juser@proteusa01 pyrad-tut]$ wget -q dereneaton.com/downloads/simRADs.zip
[juser@proteusa01 pyrad-tut]$ unzip simRADs.zip
This should create these two files:
simRADs.barcodes
simRADs_R1.fastq.gz
Create pyRAD params.txt file
Create params.txt
:
[juser@proteusa01 pyrad-tut]$ pyRAD -n
Modify the file to look like this:
==** parameter inputs for pyRAD version 3.0.5 **======================== affected step ==
./ ## 1. Working directory (all)
./*.fastq.gz ## 2. Loc. of non-demultiplexed files (if not line 16) (s1)
./*.barcodes ## 3. Loc. of barcode file (if not line 16) (s1)
vsearch ## 4. command (or path) to call vsearch (or usearch) (s3,s6)
muscle ## 5. command (or path) to call muscle (s3,s7)
TGCAG ## 6. Restriction overhang (e.g., C|TGCAG -> TGCAG) (s1,s2)
4 ## 7. N processors (parallel) (all)
6 ## 8. Mindepth: min coverage for a cluster (s4,s5)
4 ## 9. NQual: max # sites with qual < 20 (or see line 20)(s2)
.85 ## 10. Wclust: clustering threshold as a decimal (s3,s6)
rad ## 11. Datatype: rad,gbs,pairgbs,pairddrad,(others:see docs)(all)
4 ## 12. MinCov: min samples in a final locus (s7)
3 ## 13. MaxSH: max inds with shared hetero site (s7)
c85m4p3 ## 14. Prefix name for final output (no spaces) (s7)
==== optional params below this line =================================== affected step ==
## 15.opt.: select subset (prefix* only selector) (s2-s7)
## 16.opt.: add-on (outgroup) taxa (list or prefix*) (s6,s7)
## 17.opt.: exclude taxa (list or prefix*) (s7)
## 18.opt.: loc. of de-multiplexed data (s2)
## 19.opt.: maxM: N mismatches in barcodes (def= 1) (s1)
## 20.opt.: phred Qscore offset (def= 33) (s2)
## 21.opt.: filter: def=0=NQual 1=NQual+adapters. 2=strict (s2)
## 22.opt.: a priori E,H (def= 0.001,0.01, if not estimated) (s5)
## 23.opt.: maxN: max Ns in a cons seq (def=5) (s5)
8 ## 24.opt.: maxH: max heterozyg. sites in cons seq (def=5) (s5)
## 25.opt.: ploidy: max alleles in cons seq (def=2;see docs) (s4,s5)
## 26.opt.: maxSNPs: (def=100). Paired (def=100,100) (s7)
## 27.opt.: maxIndels: within-clust,across-clust (def. 3,99) (s3,s7)
## 28.opt.: random number seed (def. 112233) (s3,s6,s7)
## 29.opt.: trim overhang left,right on final loci, def(0,0) (s7)
* ## 30.opt.: output formats: p,n,a,s,v,u,t,m,k,g,* (see docs) (s7)
## 31.opt.: maj. base call at depth>x<mindepth (def.x=mindepth) (s5)
## 32.opt.: keep trimmed reads (def=0). Enter min length. (s2)
## 33.opt.: max stack size (int), def= max(500,mean+2*SD) (s3)
## 34.opt.: minDerep: exclude dereps with <= N copies, def=1 (s3)
## 35.opt.: use hierarchical clustering (def.=0, 1=yes) (s6)
## 36.opt.: repeat masking (def.=1='dust' method, 0=no) (s3,s6)
16 ## 37.opt.: vsearch max threads per job (def.=6; see docs) (s3,s6)
==== optional: list group/clade assignments below this line (see docs) ==================
Suggestion for lines 7 and 37: TBA.
Create Job Script and Submit
Then, write this job script for submission.
#!/bin/bash
#$ -S /bin/bash
#$ -P myrsrchPrj
#$ -M myname@drexel.edu
#$ -cwd
#$ -j y
#$ -l h_rt=2:00:00
#$ -l m_mem_free=3g
#$ -l h_vmem=4g
#$ -l vendor=amd
#$ -pe shm 64
. /etc/profile.d/modules.sh
module load shared
module load sge/univa
module load gcc
module load proteus
module load python/2.7-current
module load vsearch
module load muscle
module load pyrad
pyRAD -p params.txt -s 1
pyRAD -p params.txt -s 2
pyRAD -p params.txt -s 3
pyRAD -p params.txt -s 4
pyRAD -p params.txt -s 5
pyRAD -p params.txt -s 6
pyRAD -p params.txt -s 7
### Instead of doing those seven steps separately, you should be able to do it all in one go. See documentation.
The final stats will be in the file "stats/c85m4p3.stats
".
Resource Usage
The resource usage from running this exact example is (see the file
job_resource_usage.241240.txt
):
==============================================================
qname all.q
hostname ac03n02.cm.cluster
group urcfadm
owner dwc62
project urcfadmPrj
department defaultdepartment
jobname pyradtest.sh
jobnumber 241240
taskid undefined
account sge
priority 0
cwd /lustre/scratch/examples/pyRAD
submit_host proteusi01.cm.cluster
submit_cmd qsub pyradtest.sh
qsub_time 08/11/2015 00:30:25.173
start_time 08/11/2015 00:30:26.327
end_time 08/11/2015 00:39:16.929
granted_pe shm
slots 64
failed 0
deleted_by NONE
exit_status 0
ru_wallclock 530.602
ru_utime 1571.871
ru_stime 312.234
ru_maxrss 67260
ru_ixrss 0
ru_ismrss 0
ru_idrss 0
ru_isrss 0
ru_minflt 33705489
ru_majflt 94
ru_nswap 0
ru_inblock 66576
ru_oublock 198264
ru_msgsnd 0
ru_msgrcv 0
ru_nsignals 0
ru_nvcsw 263490
ru_nivcsw 58258
cpu 1884.105
mem 583.045
io 1.255
iow 0.000
maxvmem 2.855G
maxrss 229.281M
maxpss 164.399M
arid undefined
jc_name NONE
Highlights:
- run time (wallclock): about 9 minutes
- total memory usage (maxvmem): 2.855 GiB
References
[1] pyRAD SE RAD Tutorial - Example de novo RADseq assembly using pyRAD