Compiling hapbin
hapbin is a collection of tools for efficiently calculating Extended Haplotype Homozygosity (EHH), the Integrated Haplotype Score (iHS) and the Cross Population Extended Haplotype Homozogysity (XP-EHH) statistic.[1]
Download
Clone the repo:
[juser@proteusa01 ~]$ git clone
https://github.com/evotools/hapbin.git
Read documentation in the file README.md
.
Requirements
Needs a proper CMake (i.e. one which uses GCC 4.8.1 loaded with the
modulefile "gcc/4.8.1
"), so:
[juser@proteusa01 ~]$ module load cmake/gcc/3.2.1
Needs MPI if you want to deal with "very large" files. So:
[juser@proteusa01 ~]$ module load proteus-openmpi/gcc/64/1.8.1-mlnx-ofed
The generated makefile defaults to optimizing specifically for the hardware of the system on which the cmake command is run. This means that if you build on proteusi01, you can only run on the Intel nodes. And if you build on proteusa01, you can only run on the AMD nodes.
CMake Toolchain File
The OpenMPI paths need to be set, and the compilers set to use the
MPI-specific ones. Create this toolchain file, named
"PROTEUS-toolchain-mpi.cmake
", and place it in the "build
"
directory:
### PROTEUS - this uses MPI -- only useful if you have very large files
string(REPLACE "home" "work" WORK_PATH $ENV{HOME})
#Must force set in order to be correctly set by CMake on the first run of cmake.
set(SOFTWARE_PATH "/mnt/HA/groups/myresearchGrp/SOFTWARE")
set(CMAKE_INSTALL_PREFIX "${SOFTWARE_PATH}/hapbin" CACHE STRING "Install path" FORCE)
set(CMAKE_C_COMPILER mpicc)
set(CMAKE_CXX_COMPILER mpic++)
set(MPI_HOME "/mnt/HA/opt/openmpi/gcc/64/1.8.1-mlnx-ofed")
set(MPI_C_LIBRARIES "${MPI_HOME}/lib/libmpi.a")
set(MPI_C_INCLUDE_PATH "${MPI_HOME}/include")
set(MPI_CXX_LIBRARIES "${MPI_HOME}/lib/libmpi_cxx.a")
set(MPI_CXX_INCLUDE_PATH "${MPI_HOME}/include")
set(MPI_EXTRA_LIBRARY "/opt/mellanox/mxm/lib/libmxm.a")
SET(CMAKE_BUILD_WITH_INSTALL_RPATH TRUE)
SET(CMAKE_INSTALL_RPATH "${CMAKE_INSTALL_PREFIX}/lib")
#for MPIRPC as a separate project
#set(CMAKE_PREFIX_PATH "$ENV{HOME}/install/")
Compiling and Installing
First, generate makefiles:
[juser@proteusa01 build]$ cmake ../src/ -DCMAKE_TOOLCHAIN_FILE=PROTEUS-toolchain-mpi.cmake
Then, install:
[juser@proteusa01 build]$ make -j 8 install
This puts the executables in
/mnt/HA/groups/myrsrchGrp/SOFTWARE/hapbin/bin
and the libraries in
.../hapbin/lib
LD_LIBRARY_PATH does not need to be set manually at run time since the paths are hard coded at compile time (using rpath).
Running
In your job script, you do not need to load the cmake module, but you must load the OpenMPI one:
module load proteus-openmpi/gcc/64/1.8.1-mlnx-ofed
Since this is an MPI program, you must run with ${MPI_RUN}
[2]
#$ -pe openmpi_ib 128
#$ -l vendor=amd
...
${MPI_RUN} -x LD_LIBRARY_PATH
hapbin_command
Since the OpenMPI installed here is integrated with the scheduler, you do not need to specify the number of slots to MPI_RUN.
Note on Scalability
It is not necessarily the case that more processors (slots) leads to better performance. You should test with various numbers of slots to find the best performance.