Phyluce
phyluce is a software package that was initially developed for analyzing data collected from ultraconserved elements in organismal genomes.[1]
Installing phyluce
Prerequisite
phyluce requires a recent version of git to run.[2] Load this module:
module load git
Java Dependency
phyluce requires Java 1.7.0 which is already installed on all Proteus nodes. However, the default on Proteus is Java 1.8.0. To set up your default, add these lines to ~/.bashrc
# Use java 1.7.0 as default
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-oracle.x86_64
export PATH=${JAVA_HOME}/bin:${PATH}
Anaconda Python 2.7
phyluce requires an Anaconda-based Python 2.7. The current (2017-10-16) version 5.0.0.1 works with the workaround of setting up a separate virtualenv.
virtualenv
If you now do "conda install phyluce", you will likely get an error about a conflict with the package "pywavelets" or "bottleneck" or some others:
<source lang="text> $ conda install phyluce Fetching package metadata ........... Solving package specifications: .
UnsatisfiableError: The following specifications were found to be in conflict:
- phyluce -> biopython 1.63 -> numpy 1.7* -> mkl 10.3
- pywavelets
Use "conda info
The fix is to set up a separate virtualenv:[3][4]
### this creates a new virtualenv named "phyluce"
[juser@proteusi01 ~]$ conda create --name phyluce python=2
[juser@proteusi01 ~]$ source activate phyluce
[juser@proteusi01 ~]$ conda install phyluce
# do some work
# when finished
[juser@proteusi01 ~]$ source deactivate
ANACONDA_HOME
Because of this virtual environment, the ANACONDA_HOME environment variable needs to be adjusted. This should not be done in ~/.bashrc; it should be done only within the new "phyluce" environment.
[juser@proteusi01 ~]$ export ANACONDA_HOME=/mnt/HA/groups/myrsrchGrp/Software/anaconda2/envs/phyluce
Tutorial
You can test this out using the tutorial: https://phyluce.readthedocs.io/en/latest/tutorial-one.html
The tutorial has other dependencies:
- illumiprocessor
- trimmomatic
illumiprocessor
The illumiprocessor installation will also install trimmomatic:
conda install illumiprocessor
However, illumiprocessor defaults to searching for trimmomatic in ~/anaconda2 If your anaconda is installed elsewhere, you will need to specify the path to trimmomatic (because it is installed in a subdirectory of anaconda):
illumiprocessor --trimmomatic $ANACONDA_HOME/jar/trimmomatic ....
To try out the tutorial, you should request an interactive session with lots of memory because the process consumes up to 148 GB of memory.
qlogin -l exclusive -l m_mem_free=3g -l h_vmem=4G -pe shm 64 -l h_rt=4:00:00
(note the lower case "g" in "m_mem_free=3g" and the upper case "G" in "h_vmem=4G"). Since the only nodes which have enough free memory to satisfy the request for 64*3g ~ 192 GB are the AMD nodes, this will give an interactive session on an AMD node.
Once the session starts, you must execute your ~/.bashrc manually to set up your environment:
. ~/.bashrc
If you run the example with 16 cores:
illumiprocessor --trimmomatic $ANACONDA_HOME/jar/trimmomatic.jar \
--input raw-fastq/ \
--output clean-fastq \
--config illumiprocessor.conf \
--cores 16
This memory consumption seems fairly independent of the number of cores used:
- 4 cores used ~ 146 GB
- 16 cores used ~ 148 GB
- 64 cores used ~ 156 GB
And the run time was similar (within a few seconds) for all three.
Use in Job Scripts
The customizations above need to be included into job scripts to ensure a correct environment.
#!/bin/bash
#$ -S /bin/bash
... ### FILL IN THE BLANKS WITH APPROPRIATE
#$ -pe shm 64
#$ -l m_mem_free=3G
#$ -l h_vmem=4G
#$ -l h_rt=12:00:00
. ~/.bashrc
. /etc/profile.d/modules.sh
module load shared
module load gcc
module load sge/univa
module load proteus
module load git
source activate phyluce
export ANACONDA_HOME=/mnt/HA/myrsrchGrp/Software/anaconda2/envs/phyluce
illumiprocessor --input raw-fastq/ --output clean-fastq --config illumiprocessor.conf --trimmomatic $ANACONDA_HOME/jar/trimmomatic.jar --cores $NSLOTS
XXX ...
source deactivate phyluce
References
[2] phyluce GitHub issue #59 "OSError", comment by brantfaircloth
[3] phyluce GitHub issue #57 "bottleneck conflict during install...", comment by brantfaircloth