Anaconda
The open source Anaconda Distribution is an easy way to get started with Python and R. It provides the conda package manager, to install packages for Python and R, as well as different versions of the Python and R language. It also provides a large repository of Python and R packages, many of which are optimized for current CPU hardware from Intel.
CAUTION Installing your own copy or copies of Anaconda will cause errors if you decide to use one of the system-installed Anaconda versions which provide Python and/or R. Additionally, a full Anaconda installation may cause your storage usage to exceed the 17 GB hard quota on Proteus home directories (Picotte home quota is 64 GB). The Anaconda installer will ignore all warnings generated by the system, and continue trying to download packages: this goes on until the installer process is terminated. While this goes on, the server providing home and group directories becomes overloaded, and every process which tries to access the home or group directories stalls.
CAUTION FOR CUDA/GPU USERS Anaconda provides its own set of CUDA libraries and utilities. This may cause poorer performance of your own (or other applications like Matlab's) code.
URCF-Installed Anaconda vs. Self-Installed Anaconda vs. Group-Installed Anaconda
In general, it is not easy to switch between different Anaconda installations due to the way Anaconda uses shell functions which cannot be abstracted into modulefiles.
If you want to use Anaconda, you have the choice of:
- using URCF-installed Anaconda
- using your own private Anaconda installation
- using Anaconda that has been installed in your group directory, to be used by all members of the group
Once you pick one, changing the installation to be used will involve
editing your .bashrc
file, then logging out and back in.
URCF-Installed Anaconda
To use the URCF-installed Anaconda:
[juser@picotte001 ~]$ module load python/anaconda3
[juser@picotte001 ~]$ conda init
then, log out and log back in.
Note that this setup modifies your login script, and sets your environment to use this Anaconda Python version for all future login sessions.
Environments
There are multiple Conda environments installed. Do "conda env list
"
to show them.
Job Scripts
Job scripts should execute the login script before running Python, or activating environments:
. ~/.bashrc
See example: Slurm - Job Script Example 05a TensorFlow With Anaconda Python
Conda for a Single-User
This is the default behavior of the Anaconda installer.
Download the Individual Edition, execute the installer shell script, and accept all defaults. Once complete, do
conda init
then log out, and log back in.
Caution - Anaconda can consume a lot of disk space:
- on Picotte, home directories have a quota of 64 GB
- on Proteus, home directories have a quota of 15 GB
Anaconda for Multi-user Environments
You may wish to install Anaconda to provide Python or R to your research
group. It is best to have one person be the Conda "administrator". To
ensure the Anaconda installation is controlled by your selected Anaconda
administrator, install a .condarc
file in the root of the installation
which overrides users' .condarc
files.
Please see full details in the official Anaconda documentation: https://conda.io/projects/conda/en/latest/user-guide/configuration/admin-multi-user-install.html
Once it is installed, anyone who wishes to use that Anaconda installation will need to do:
/ifs/groups/someGrp/anaconda3/bin/conda init
and then log out and log back in.
To have each person be able to control their own Python environment (Python version, plus modules), each person should create their own Conda environment. See next section.
Conda Environments
The definitive documentation for Conda environments is available online.[1]
See Available Envs
To see available environments:
conda env list
Anaconda installations on Proteus have multiple environments already created.
Env Indicator
Your shell prompt will have the name of the active environment in parentheses:
(base) [juser@proteusi01 ~]$
Create an Env
Say you want to create a conda env named "r-3.5
":
(base) [juser@proteusi01 ~]$ conda create --name r-3.5 r=3.5
In this way, you can have multiple envs to install different versions of Python, or R:
(base) [juser@proteusi01 ~]$ conda create --name r-3.4
(base) [juser@proteusi01 ~]$ conda create --name py37
To create an env and install a specific language or module in one step, specify the name of the package and its version:
(base) [juser@proteusi01 ~]$ conda create --name python37 python=3.7
Switch to the Newly-created Env
(base) [juser@proteusi01 ~]$ conda activate r-3.5
(r-3.5) [juser@proteusi01 ~]$
Note the change in the active env tag in your prompt.
Exit the Env (to "base")
(r-3.5) [juser@proteusi01 ~]$ conda deactivate
(base) [juser@proteusi01 ~]$
Remove an Env
Remove the env "some-env" and delete all files installed for that env:
(base) [juser@proteusi01 ~]$ conda remove --name some-env --all
Use in Job Scripts
The same steps above apply, except there is no prompt.
#$ -pe shm 16
...
module load python/anaconda3
conda activate r-3.5
R CMD BATCH mycomputation.R
conda deactivate
conda activate python37
python3.7 myothercomputation.py
### NB if these two computations are dependent, i.e. the Python script depends
### on output created by the R script, it is better to run dependent jobs
### rather than putting separate computations into one job. This is because
### the separate computations may have different requirements in terms of
### no. of slots, amount of memory, etc.
Installing Your Own Python
Use miniconda<ref name="miniconda_downloads>Miniconda documentation including download links to install a minimal conda installation. Use your group directory for the installation so that all members of your research group can use the same version. Then, create an env with a full anaconda installation containing the version of Python you want.
Create a conda env with the version of your choice:
conda create --name somename python=3.7
Where "somename" should be replaced with an appropriate string (no spaces). It is useful to use the Python version, e.g. "python37".
CAUTION trying to install a full Anaconda in your home directory may cause cluster-wide issues due to the 15 GB quota imposed on home directories. The anaconda installer ignores all quota warnings and will continue writing data into the home directory past the quota limit.
Installing Your Own R
This should not be required, unless you need an old version of R. As with all Anaconda installations, it will modify your environment and make it so that using system-installed versions of R difficult.
You can use Anaconda to install R.[2] Since Feb 2018, Anaconda uses Microsoft R Open as its default R.[3] MRO is multithreaded, and linked against the Intel® Math Kernel Library (Intel® MKL)[4] for optimized mathematical operations.
However, since Anaconda provides many channels, the conda install
command may likely pick up a build of R that is not MRO. To be certain
to pick the correct R, install mro-base
as shown below, rather than
"r
" or "r-base
".
First, download and install Anaconda from:
https://www.anaconda.com/distribution/
Select the Linux edition, "Python 3.7" version, "64-Bit (x86) Installer". The installer is a file named something like:
Anaconda3-2018.12-Linux-x86_64.sh
Run the installer by doing:
sh Anaconda3-2018.12-Linux-x86_64.sh
It will prompt you for an installation location. By default, that
location will be something like "/home/myname/anaconda3
". Please DO
NOT install to your home directory: the default Anaconda installation
will exceed the 15 GB quota on home directories, and the repeated
warning messages during installation will not be caught by the Anaconda
installer, and cause the storage server to become overloaded handling
these errors, causing a cluster-wide slowdown.
Instead, install into your group directory:
/mnt/HA/groups/myrsrchGrp/anaconda3
The installer will also modify your ~/.bashrc
file, adding commands to
setup for using Anaconda.
NOTE If you install for your group, each member of your group will
need to modify their ~/.bashrc
file to setup for Anaconda. Or, you can
set up a modulefile. See the Environment
Modules article for details.
Once the installation is done, complete setup by doing:
. ~/.bashrc
The latest version of conda will, by default, prepend the conda env name to the prompt. The default conda env is named "base" so you will see your prompt change to:
(base) [juser@proteusi01]$
Next, update the conda
command, which is how packages are managed:
conda update -n base -c defaults conda
For the following, we install R into the "base
" environment. If you
wish to use both R and Python from Anaconda, it would be better to
create a separate conda env for each.
Install MRO:
conda install -n base -c r mro-base mro-basics
NB There are other R packages available via conda, e.g. "r-base" with "r-essentials". Those do not seem to install MRO.
Once the installation is complete, run R to see its welcome message and version:
(base) [juser@proteusi01 ~]$ R
R version 3.5.1 (2018-07-02) -- "Feather Spray"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
Microsoft R Open 3.5.1
The enhanced R distribution from Microsoft
Microsoft packages Copyright (C) 2018 Microsoft Corporation
Using the Intel MKL for parallel mathematical computing (using 16 cores).
Default CRAN mirror snapshot taken on 2018-08-01.
See: https://mran.microsoft.com/.
>
The important bit is
Microsoft R Open 3.5.1
The enhanced R distribution from Microsoft
Microsoft packages Copyright (C) 2018 Microsoft Corporation
Using the Intel MKL for parallel mathematical computing (using 16 cores).
which confirms that you have MRO, and it uses MKL in multithreaded (parallel) mode.
Installing R Packages
Once you have Anaconda MRO installed, it is important that the R
packages you install are via Anaconda. E.g. if you need to install the
"ggplot2
" R package, you should do:
(base) [juser@proteusi01]$ conda install -c r r-ggplot2
rather than typing "install.packages("ggplot2")
" from within R. Note
that package names for conda are the R package names with "r-
"
prepended.
Only if the package you want is not available via Conda should you use
"package.install()
".
The "-c r
" argument to "conda install
" says to use the channel named
"r". There are also community-contributed channels, such as
"conda-forge
" but using those may force an update or downgrade to R to
a version that is not MRO.
Of course, if you absolutely need a specific package that is not offered by Microsoft's MRAN channel which feeds the Anaconda "r" channel, then you will have to use whichever R version the package requires, even if it is unoptimized and single-threaded.
You can select the Microsoft MRAN instead of CRAN by specifying the "repos" argument:
install.packages("pkg_name_here", repos="
https://mran.microsoft.com
")
Installing MRO (Microsoft R Open) 3.4.3
If you need R 3.4.x in order to use a specific R package that only works in R 3.4.
NOTE Having multiple Anaconda installations is not a good idea. Use separate Conda environments in order to have different versions of R. See: https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#
- Install Anaconda
- Download from here: https://www.anaconda.com/distribution/#linux
- Log out, and log back in. (Or execute your .bashrc:
"
. ~/.bashrc
" )
- Create env:
conda create --name r-3.4
- Switch to env r-3.4:
conda activate r-3.4
- Unload gcc/4.8.1 module (otherwise R will crash immediately):
module unload gcc
- Switch to env r-3.4:
- Install R 3.4.3:
conda install -c r mro-base==3.4.3
- Make sure the GCC C- and C++-compiler are installed. Use
"
conda list
" to make sure that gcc_linux-64 7.3.0 andgxx_linux-64
7.3.0 are installed. - Use R's
install.packages("pkg_name", repos="
https://mran.microsoft.com
", Ncpus=16)
instead of "conda install ...
"- With care, you can install R packages with conda if you search
first for packages with a build ID containing "
mro343
".
- With care, you can install R packages with conda if you search
first for packages with a build ID containing "
- If a package is not available from Microsoft MRAN (Microsoft R
Archive Network), use the standard CRAN (Comprehensive R Archive
Network) by omitting the "
repos=...
" argument.- E.g. the "
brms
" package requires the package "bridgesampling
", which is not in MRAN. So, install using:install.packages("bridgesampling", Ncpus=16, type="source")
which will use CRAN rather than MRAN.
- E.g. the "
For Hoque Group R 3.4.3
conda install mkl==2019.1=144
conda install blas==1.0=mkl
conda install udunits==2.2.17=6 udunits2==2.2.27.6=h4e0c4b3_1001
- Then, inside R, do:
install.package("brms", repos="
https://mran.microsoft.com
", Ncpus=16)
conda install -c r r-latticeextra==0.6_28=mro343h889e2dd_0
conda install -c conda-forge r-ade4==1.7_13=r341hc070d10_0
conda install -c r r-lme4==1.1_15=mro343h599a50d_0
conda install -c r r-ggplot2==2.2.1=mro343h889e2dd_0
For Hoque Group R 3.5.1
DO NOT REINSTALL ANACONDA OR R OR ANY PACKAGES. DO NOT MODIFY THE EXISTING ANACONDA/R INSTALLATION. DO NOT USE "install()" COMMANDS IN YOUR R SCRIPTS.
TO USE:
- Make sure your
~/.bashrc
has the following as its last line:
### in the following line, note the initial "."
. /mnt/HA/groups/hoqueGrp/etc/anaconda3_setup.sh
DO NOT DO ANY OF THE FOLLOWING UNLESS YOU KNOW EXACTLY WHAT YOU ARE DOING.
conda create -n r-3.5 ### see below if this creates an env in ~/.conda/envs
conda activate r-3.5
conda install --channel r mro-base==3.5.1 mro-basics==3.5.1
conda install --channel r r-udunits2 r-units
conda install libgdal
May need to force the location of the env by doing:
conda create --prefix /mnt/HA/groups/hoqueGrp/Applications/anaconda3/envs/r-3.5
If you don't do this, the env will be installed for yourself only in
~/.conda/envs/
With "conda install --channel r
" -- just cat the list together and
install -- "conda install --channel r $( cat pkglist.txt | xargs )
":
r-assertthat=0.2.0=mro351hf348343_0
r-boot=1.3_20=mro351_0
r-checkpoint=0.4.4=mro351_0
r-class=7.3_14=mro351hd10c6a6_0
r-cli=1.0.0=mro351hf348343_0
r-cluster=2.0.7_1=mro351hac1494b_0
r-codetools=0.2_15=mro351hf348343_0
r-colorspace=1.3_2=mro351hd10c6a6_0
r-crayon=1.3.4=mro351hf348343_0
r-curl=3.2=mro351hd10c6a6_1
r-deployrrserve=9.0.0=mro351_0
r-devtools=1.13.6=mro351hf348343_0
r-dichromat=2.0_0=mro351hf348343_0
r-digest=0.6.15=mro351hd10c6a6_0
r-doparallel=1.0.13=mro351_0
r-fansi=0.2.3=mro351hd10c6a6_0
r-foreach=1.5.0=mro351_0
r-foreign=0.8_70=mro351_0
r-gdata=2.18.0=mro351hf348343_0
r-ggplot2=3.0.0=mro351hf348343_0
r-glue=1.3.0=mro351hd10c6a6_0
r-gtable=0.2.0=mro351hf348343_0
r-gtools=3.8.1=mro351hd10c6a6_0
r-iterators=1.0.10=mro351hf348343_0
r-jsonlite=1.5=mro351hd10c6a6_0
r-kernsmooth=2.23_15=mro351hac1494b_0
r-labeling=0.3=mro351hf348343_0
r-lattice=0.20_35=mro351hd10c6a6_0
r-latticeextra=0.6_28=mro351hf348343_0
r-lazyeval=0.2.1=mro351hd10c6a6_0
r-lme4=1.1_17=mro351hebc1506_0
r-lmertest=3.0_1=mro351hf348343_0
r-magrittr=1.5=mro351hf348343_0
r-mass=7.3_49=mro351_0
r-matrix=1.2_14=mro351hac1494b_0
r-mgcv=1.8_23=mro351_0
r-microsoftr=3.5.0.108=mro351_0
r-minqa=1.2.4=mro351h2efac65_0
r-mnormt=1.5_5=mro351hac1494b_0
r-munsell=0.5.0=mro351hf348343_0
r-nlme=3.1_137=mro351hac1494b_0
r-nloptr=1.0.4=mro351hebc1506_0
r-nnet=7.3_12=mro351hd10c6a6_0
r-numderiv=2016.8_1=mro351hf348343_0
r-pillar=1.3.0=mro351hf348343_0
r-plyr=1.8.4=mro351hebc1506_0
r-png=0.1_7=mro351hd10c6a6_0
r-psych=1.8.4=mro351hf348343_0
r-r6=2.2.2=mro351hf348343_0
r-raster=2.6_7=mro351hebc1506_0
r-rcolorbrewer=1.1_2=mro351hf348343_0
r-rcpp=0.12.18=mro351hebc1506_0
r-rcppeigen=0.3.3.4.0=mro351h2efac65_0
r-recommended=3.5.1=mro351_0
r-reshape2=1.4.3=mro351hebc1506_0
r-revoioq=10.0.0=mro351_0
r-revomods=11.0.0=mro351_0
r-revoutils=11.0.0=mro351_0
r-revoutilsmath=11.0.0=mro351_0
r-rlang=0.2.1=mro351hd10c6a6_0
r-rpart=4.1_13=mro351hd10c6a6_0
r-runit=0.4.26=mro351_0
r-scales=0.5.0=mro351hebc1506_0
r-sp=1.3_1=mro351hd10c6a6_0
r-spatial=7.3_11=mro351_0
r-stringi=1.2.4=mro351hebc1506_0
r-stringr=1.3.1=mro351hf348343_0
r-survival=2.41_3=mro351_0
r-tibble=1.4.2=mro351hd10c6a6_0
r-utf8=1.1.4=mro351hd10c6a6_0
XXX DO NOT INSTALL r-v8=3.0.2=r35h0357c0b_0 ### XXX causes conflicts; need to use a system-installed V8, with a manually install r-v8 package.
### Microsoft MRAN gives 404 for the source pkg. See below.
r-viridislite=0.3.0=mro351hf348343_0
r-withr=2.1.2=mro351hf348343_0
May need:
- v8 (?)
- V8 needs to be installed on system; i.e. yum install v8 v8-devel; v8 needs to be installed on all compute nodes that will run this R
- Then download V8_3.0.2.tar.gz source tarball -- NB this is the R interface to V8 https://cran.r-project.org/web/packages/V8/index.html
- And install at shell:
[juser@proteusi01 foo]$ R CMD INSTALL --configure-vars='INCLUDE_DIR=/usr/include LIB_DIR=/usr/lib64' \
/mnt/HA/groups/hoqueGrp/Applications/Src/V8_3.0.2.tar.gz
In R -- NOTE: at least some of these packages, while listed in the R script, are not actually needed since the R script runs just fine without them. The list of packages needs to be edited down to those strictly necessary, rather than "just in case".
install.packages("brms", repos="https://mran.microsoft.com", type="source", Ncpus=16)
install.packages("spdep", repos="https://mran.microsoft.com", type="source", Ncpus=16)
install.packages("maps", repos="https://mran.microsoft.com", type="source", Ncpus=16)
install.packages("mapdata", repos="https://mran.microsoft.com", type="source", Ncpus=16)
install.packages("ade4", repos="https://mran.microsoft.com", type="source", Ncpus=16)
install.packages("ggmap", repos="https://mran.microsoft.com", type="source", Ncpus=16)
install.packages("rworldmap", repos="https://mran.microsoft.com", type="source", Ncpus=16)
install.packages("HSAR", repos="https://mran.microsoft.com", type="source", Ncpus=16)
install.packages("lmtest", repos="https://mran.microsoft.com", type="source", Ncpus=16)
install.packages("sandwich", repos="https://mran.microsoft.com", type="source", Ncpus=16)
install.packages("sphet", repos="https://mran.microsoft.com", type="source", Ncpus=16)
### XXX NOPE install.packages("McSpatial", repos="https://mran.microsoft.com", type="source", Ncpus=16)
### XXX NOPE install.packages("olsrr", repos="https://mran.microsoft.com", type="source", Ncpus=16)
### XXX INSTALL FAILS install.packages("rgdal", repos="https://mran.microsoft.com", type="source", Ncpus=16)
install.packages("languageR", repos="https://mran.microsoft.com", type="source", Ncpus=16)
install.packages("usmap", repos="https://mran.microsoft.com", type="source", Ncpus=16)
install.packages("USAboundaries", repos="https://mran.microsoft.com", type="source", Ncpus=16)
install.packages("sabre", repos="https://mran.microsoft.com", type="source", Ncpus=16)
install.packages("psych", repos="https://mran.microsoft.com", type="source", Ncpus=16)
In job scripts:
...
#$ -pe shm 16
#$ -l vendor=intel
...
. /etc/profile.d/modules.sh
module load shared
module load proteus
module load sge/univa
module unload gcc
# Note the initial "."
. /mnt/HA/groups/hoqueGrp/etc/anaconda3_setup.sh
conda activate r-3.5
which R
R CMD BATCH rscript_name.R
References
[1] Anaconda User Guide - Tasks - Managing Environments
[2] Anaconda User Guide - Using the R Language
[3] Anaconda Blog - Introducing Microsoft R Open (MRO) as Default R for Anaconda Distribution