Installing TensorFlow 2.10.1 using pip and venv

We follow the official instructions for installation via pip, except we use a pre-installed Python via the modulefile python/gcc/3.10, and we use Python virtual environments (venv)[1][2] instead of miniconda (or Anaconda).

N.B. TensorFlow from pip supports CPU-only and GPUs. There is no need to install both "tensorflow" and "tensorflow-gpu" packages: they are identical.

Remove conda

Before doing anything, make sure to delete or comment out the Anaconda setup lines in your ~/.bashrc file (if they exist). Look for the following block and comment out or delete it:

# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/ifs/opt/python/anaconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
    eval "$__conda_setup"
else
    if [ -f "/ifs/opt/python/anaconda3/etc/profile.d/conda.sh" ]; then
        . "/ifs/opt/python/anaconda3/etc/profile.d/conda.sh"
    else
        export PATH="/ifs/opt/python/anaconda3/bin:$PATH"
    fi
fi
unset __conda_setup

Requirements

Listed requirements (and check support matrix for each):

CUDA driver >= 450.80.02
CUDA Toolkit 11.2
CuDNN 8.1.0
Optional: TensorRT to improve latency and throughput for inference

N.B. Using the above listed requirements will result in warning messages about not being able to find certain library (shared object) files:

2022-12-05 07:15:48.158455: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvrtc.so.11.1: cannot open shared object file: No such file or directory; ...

Actual requirements because TF pip package from PyPI links to CUDA 11.1:

CUDA driver >= 450.80.02
CUDA Toolkit 11.1
CuDNN 8.0
TensorRT 7.2

Interactive session on GPU node

Run an interactive shell on a GPU node:

[juser@picotte001 ~]$ srun -p gpu --gres=gpu:1 --mem-per-gpu=16G --ntasks-per-gpu=12 --time=2:00:00 --pty /bin/bash
[juser@gpu005 ~]$

Load requirements

Set up environment and load appropriate modulefiles:

[juser@gpu005 ~]$ module use /ifs/opt_cuda/modulefiles
[juser@gpu005 ~]$ module load python/gcc/3.10
[juser@gpu005 ~]$ module load cuda11.1/toolkit cuda11.1/blas cuda11.1/fft cudnn8.0-cuda11.1 tensorrt-cuda11.1/7.2.3.4

Set up Python virtual environment (venv)

[juser@gpu005 ~]$ cd /ifs/groups/myrsrchGrp
[juser@gpu005 myrsrchGrp]$ mkdir venvs
[juser@gpu005 myrsrchGrp]$ python3 -m venv ./venvs/py310-tf210
[juser@gpu005 myrsrchGrp]$ source ./venvs/py310-tf210/bin/activate
(py310-tf210) [juser@gpu005 myrsrchGrp]$

Note the change of prompt: the venv name "(py310-tf210)" is added.

Next, check that the venv is active by looking at the location of the python3 executable.

(py310-tf210) [juser@gpu005 myrsrchGrp]$ which python3
/ifs/groups/myrsrchGrp/venvs/py310-tf210/bin/python3

Update pip and setuptools

(py310-tf210) [juser@gpu005 myrsrchGrp]$ python3 -m pip install -U pip setuptools
Collecting pip
  Using cached pip-22.3.1-py3-none-any.whl (2.1 MB)
Requirement already satisfied: setuptools in ./venvs/py310-tf210/lib/python3.10/site-packages (63.2.0)
Collecting setuptools
  Using cached setuptools-65.6.3-py3-none-any.whl (1.2 MB)
Installing collected packages: setuptools, pip
  Attempting uninstall: setuptools
    Found existing installation: setuptools 63.2.0
    Uninstalling setuptools-63.2.0:
      Successfully uninstalled setuptools-63.2.0
  Attempting uninstall: pip
    Found existing installation: pip 22.2.2
    Uninstalling pip-22.2.2:
      Successfully uninstalled pip-22.2.2
Successfully installed pip-22.3.1 setuptools-65.6.3

Install TensorFlow

Install TensorFlow 2.10.1 using pip:

(py310-tf210) [juser@gpu005 myrsrchGrp]$ python3 -m pip install tensorflow==2.10.1
Collecting tensorflow==2.10.1
  Downloading tensorflow-2.10.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (578.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 578.1/578.1 MB 4.0 MB/s eta 0:00:00
...
Installing collected packages: tensorboard-plugin-wit, pyasn1, libclang, keras, flatbuffers, wrapt, wheel, urllib3, typing-extensions, termcolor, tensorflow-io-gcs-filesystem, tensorflow-estimator, tensorboard-data-server, six, rsa, pyparsing, pyasn1-modules, protobuf, oauthlib, numpy, MarkupSafe, markdown, idna, grpcio, gast, charset-normalizer, certifi, cachetools, absl-py, werkzeug, requests, packaging, opt-einsum, keras-preprocessing, h5py, google-pasta, google-auth, astunparse, requests-oauthlib, google-auth-oauthlib, tensorboard, tensorflow
Successfully installed MarkupSafe-2.1.1 absl-py-1.3.0 astunparse-1.6.3 cachetools-5.2.0 certifi-2022.9.24 charset-normalizer-2.1.1 flatbuffers-22.11.23 gast-0.4.0 google-auth-2.15.0 google-auth-oauthlib-0.4.6 google-pasta-0.2.0 grpcio-1.51.1 h5py-3.7.0 idna-3.4 keras-2.10.0 keras-preprocessing-1.1.2 libclang-14.0.6 markdown-3.4.1 numpy-1.23.5 oauthlib-3.2.2 opt-einsum-3.3.0 packaging-21.3 protobuf-3.19.6 pyasn1-0.4.8 pyasn1-modules-0.2.8 pyparsing-3.0.9 requests-2.28.1 requests-oauthlib-1.3.1 rsa-4.9 six-1.16.0 tensorboard-2.10.1 tensorboard-data-server-0.6.1 tensorboard-plugin-wit-1.8.1 tensorflow-2.10.1 tensorflow-estimator-2.10.0 tensorflow-io-gcs-filesystem-0.28.0 termcolor-2.1.1 typing-extensions-4.4.0 urllib3-1.26.13 werkzeug-2.2.2 wheel-0.38.4 wrapt-1.14.1

Test TensorFlow

Run a simple one-line test to create a random 1000x1000 tensor and perform a reduce_sum():

(py310-tf210) [juser@gpu005 myrsrchGrp]$ python3 -c "import tensorflow as tf; print(tf.__version__); print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
2022-12-05 08:12:58.275229: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-05 08:12:58.396736: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-12-05 08:12:58.425008: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2.10.1
2022-12-05 08:13:03.816594: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-05 08:13:04.402772: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30976 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:18:00.0, compute capability: 7.0
tf.Tensor(-424.14557, shape=(), dtype=float32)

The result tf.Tensor(-424.14557, shape=(), dtype=float32) will be different for you since it is a random tensor.

The message about "Unable to register cuBLAS factory" can be ignored.[3]

Deactivate venv

For interactive use, remember to deactivate the venv once you are done with TensorFlow:

(py310-tf210) [juser@gpu005 myrsrchGrp]$ deactivate
[juser@gpu005 myrsrchGrp]$ which python3
[juser@gpu005 myrsrchGrp]$ /ifs/opt/python/gcc/3.10.2/bin/python3

Note that the prompt loses the "(py310-tf210)" tag.

Job scripts

Job scripts will need to set up the same environment before running the Python script.

Example Python script

Create and save this file as "test_tf.py":

#!/usr/bin/env python3
import tensorflow as tf

print(tf.__version__)
print(tf.reduce_sum(tf.random.normal([1000, 1000])))

Example job script

Create a job script named "tf_job.sh" to run the above TensorFlow computation in the same directory as the above test_tf.py file:

#!/bin/bash
#SBATCH --partition=gpu
#SBATCH --gres=gpu:1
#SBATCH --cpus-per-gpu=12
#SBATCH --mem-per-gpu=40G
#SBATCH --time=0:15:00

module use /ifs/opt_cuda/modulefiles
module load python/gcc/3.10
module load cuda11.1/toolkit cuda11.1/blas cuda11.1/fft cudnn8.0-cuda11.1 tensorrt-cuda11.1/7.2.3.4

# activate TF venv
source /ifs/groups/myrsrchGrp/venvs/py310-tf210/bin/activate

python3 test_tf.py

Submit the job:

[juser@picotte001 ~]$ sbatch tf_job.sh

The output will be in a file named "slurm-NNNNNNN.out" where "NNNNNNN" is the job ID. Its contents should be something like:

2022-12-05 08:16:01.089706: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-05 08:16:01.215730: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2.10.1
2022-12-05 08:16:01.244829: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-12-05 08:16:04.597296: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-05 08:16:05.093541: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30976 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:86:00.0, compute capability: 7.0
tf.Tensor(-1939.6317, shape=(), dtype=float32)

Examples

References

[1] Python 3.10 Documentation - venv

[2] Real Python - Python Virtual Envionments: A Primer

[3] TensorFlow GitHub issue #57663