ADAM
ADAM provides both an application programming interface (API) and a command line interface (CLI) for manipulating genomic data on a computing cluster.[1] It is part of the Big Data Genomics project.[2]
NOTE ADAM is beta software.
Installed Versions
ADAM 0.17.0 (using Scala 2.10) and ADAM 0.22.0 (using Spark 2.x and Scala 2.11) are installed on Proteus. Use one of the modulefiles:
adam/0.17.0
adam/0.22.0
Usage
Job scripts should follow the requirements for Apache Spark jobs. Here is an example job script:
#!/bin/bash
#$ -S /bin/bash
#$ -P myrsrchPrj
#$ -j y
#$ -cwd
#$ -M myname@drexel.edu
#$ -jc spark.amd
#$ -pe spark.amd 128
#$ -l vendor=amd
#$ -l exclusive
#$ -l h_rt=0:30:00
#$ -l h_vmem=4g
#$ -l m_mem_free=3g
. /etc/profile.d/modules.sh
module load shared
module load gcc
module load sge/univa
module load proteus
module load adam/0.17.0
###
### Set up environment for Spark job
###
export SPARK_CONF_DIR=${SGE_O_WORKDIR}/conf.${JOB_ID}
. ${SPARK_CONF_DIR}/spark-env.sh
./jenkins-test.proteus.sh
Example
See:
Building ADAM
You may wish to build ADAM yourself to keep up with ongoing
developments. If you do, select either a release or a branch which uses
Scala 2.10.x -- releases are usually named something like
"0.17.0_2.10
" with the Scala version dependency appended to the ADAM
version string.
Building will require Maven >= 3.1, and the appropriate Apache Spark version:
[juser@proteusa01 adam-adam-parent-2.10-0.17.0]$ module load apache/maven
[juser@proteusa01 adam-adam-parent-2.10-0.17.0]$ module load apache/spark/1.2.0
[juser@proteusa01 adam-adam-parent-2.10-0.17.0]$ mvn clean package | tee Maven.package.out
Testing
To integrate with the Spark installation on Proteus, the jenkins-test
script needs to be modified. Please see the example in
/mnt/HA/examples/ADAM/jenkins/
.