Skip to content

Compiling hapbin

hapbin is a collection of tools for efficiently calculating Extended Haplotype Homozygosity (EHH), the Integrated Haplotype Score (iHS) and the Cross Population Extended Haplotype Homozogysity (XP-EHH) statistic.[1]

Download

Clone the repo:

[juser@proteusa01 ~]$ git clonehttps://github.com/evotools/hapbin.git

Read documentation in the file README.md.

Requirements

Needs a proper CMake (i.e. one which uses GCC 4.8.1 loaded with the modulefile "gcc/4.8.1"), so:

[juser@proteusa01 ~]$ module load cmake/gcc/3.2.1

Needs MPI if you want to deal with "very large" files. So:

[juser@proteusa01 ~]$ module load proteus-openmpi/gcc/64/1.8.1-mlnx-ofed

The generated makefile defaults to optimizing specifically for the hardware of the system on which the cmake command is run. This means that if you build on proteusi01, you can only run on the Intel nodes. And if you build on proteusa01, you can only run on the AMD nodes.

CMake Toolchain File

The OpenMPI paths need to be set, and the compilers set to use the MPI-specific ones. Create this toolchain file, named "PROTEUS-toolchain-mpi.cmake", and place it in the "build" directory:

### PROTEUS - this uses MPI -- only useful if you have very large files
string(REPLACE "home" "work" WORK_PATH $ENV{HOME})
#Must force set in order to be correctly set by CMake on the first run of cmake.
set(SOFTWARE_PATH "/mnt/HA/groups/myresearchGrp/SOFTWARE")
set(CMAKE_INSTALL_PREFIX "${SOFTWARE_PATH}/hapbin" CACHE STRING "Install path" FORCE)
set(CMAKE_C_COMPILER mpicc)
set(CMAKE_CXX_COMPILER mpic++)
set(MPI_HOME "/mnt/HA/opt/openmpi/gcc/64/1.8.1-mlnx-ofed")
set(MPI_C_LIBRARIES "${MPI_HOME}/lib/libmpi.a")
set(MPI_C_INCLUDE_PATH "${MPI_HOME}/include")
set(MPI_CXX_LIBRARIES "${MPI_HOME}/lib/libmpi_cxx.a")
set(MPI_CXX_INCLUDE_PATH "${MPI_HOME}/include")
set(MPI_EXTRA_LIBRARY "/opt/mellanox/mxm/lib/libmxm.a")
SET(CMAKE_BUILD_WITH_INSTALL_RPATH TRUE)
SET(CMAKE_INSTALL_RPATH "${CMAKE_INSTALL_PREFIX}/lib")

#for MPIRPC as a separate project
#set(CMAKE_PREFIX_PATH "$ENV{HOME}/install/")

Compiling and Installing

First, generate makefiles:

[juser@proteusa01 build]$ cmake ../src/ -DCMAKE_TOOLCHAIN_FILE=PROTEUS-toolchain-mpi.cmake

Then, install:

[juser@proteusa01 build]$ make -j 8 install

This puts the executables in

/mnt/HA/groups/myrsrchGrp/SOFTWARE/hapbin/bin

and the libraries in

.../hapbin/lib

LD_LIBRARY_PATH does not need to be set manually at run time since the paths are hard coded at compile time (using rpath).

Running

In your job script, you do not need to load the cmake module, but you must load the OpenMPI one:

module load proteus-openmpi/gcc/64/1.8.1-mlnx-ofed

Since this is an MPI program, you must run with ${MPI_RUN}[2]

#$ -pe openmpi_ib 128 #$ -l vendor=amd ... ${MPI_RUN} -x LD_LIBRARY_PATHhapbin_command

Since the OpenMPI installed here is integrated with the scheduler, you do not need to specify the number of slots to MPI_RUN.

Note on Scalability

It is not necessarily the case that more processors (slots) leads to better performance. You should test with various numbers of slots to find the best performance.

References

[1] hapbin GitHub repository

[2] Message Passing Interface#Running 2