Installing TensorFlow 2.11.0 using pip and venv

We follow the official instructions for installation via pip, except we use a pre-installed Python via the modulefile python/gcc/3.10, and we use Python virtual environments (venv)[1][2] instead of miniconda (or Anaconda).

N.B. TensorFlow from pip supports CPU-only and GPUs. There is no need to install both "tensorflow" and "tensorflow-gpu" packages: they are identical.

Requirements♯

Listed requirements (and check support matrix for each):

CUDA driver >= 450.80.02
CUDA Toolkit 11.2
CuDNN 8.1.0
Optional: TensorRT to improve latency and throughput for inference

N.B. Using the above listed requirements will result in warning messages about not being able to find certain library (shared object) files:

2022-12-05 07:15:48.158455: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvrtc.so.11.1: cannot open shared object file: No such file or directory; ...

~~Actual requirements because TF pip package from PyPI links to CUDA 11.1:~~

CUDA driver >= 450.80.02

CUDA Toolkit 11.1

CuDNN 8.x

~~TensorRT 7.2~~

UPDATE 2023-01-10 Looks like they fixed the TF PyPI packages.

Interactive session on GPU node♯

Run an interactive shell on a GPU node:

[juser@picotte001 ~]$ srun -p gpu --gpus-per-node=1 --mem-per-gpu=16G --cpus-per-gpu=12 --time=2:00:00 --pty /bin/bash
[juser@gpu005 ~]$

Check number of GPUs assigned: should see only one, with id number 0:

[juser@gpu005 ~]$ nvidia-smi
Mon Feb 27 17:15:18 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.39.01    Driver Version: 510.39.01    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:18:00.0 Off |                    0 |
| N/A   37C    P0    41W / 300W |      0MiB / 32768MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Load requirements♯

Set up environment and load appropriate modulefiles:

[juser@gpu005 ~]$ module use /ifs/opt_cuda/modulefiles
[juser@gpu005 ~]$ module load python/gcc/3.10
[juser@gpu005 ~]$ module load cuda11.2/toolkit cuda11.2/blas cuda11.2/fft tensorrt-cuda11.2/7.2.3.4 cudnn8.7-cuda11.2 cutensor-cuda11.2

Set up Python virtual environment (venv)♯

[juser@gpu005 ~]$ cd /ifs/groups/myrsrchGrp
[juser@gpu005 myrsrchGrp]$ mkdir venvs
[juser@gpu005 myrsrchGrp]$ python3 -m venv ./venvs/py310-tf211
[juser@gpu005 myrsrchGrp]$ source ./venvs/py310-tf211/bin/activate
(py310-tf211) [juser@gpu005 myrsrchGrp]$

Note the change of prompt: the venv name "(py310-tf211)" is added.

Next, check that the venv is active by looking at the location of the python3 executable.

(py310-tf211) [juser@gpu005 myrsrchGrp]$ which python3
/ifs/groups/myrsrchGrp/venvs/py310-tf211/bin/python3

Update pip and setuptools (because there is a critical setuptools security fix):

(py310-tf211) [juser@gpu005 myrsrchGrp]$ python3 -m pip install -U pip setuptools
Requirement already satisfied: pip in ./venvs/py310-tf211/lib/python3.10/site-packages (22.2.2)
Collecting pip
  Using cached pip-22.3.1-py3-none-any.whl (2.1 MB)
Requirement already satisfied: setuptools in ./venvs/py310-tf211/lib/python3.10/site-packages (63.2.0)
Collecting setuptools
  Using cached setuptools-65.6.3-py3-none-any.whl (1.2 MB)
Installing collected packages: setuptools, pip
  Attempting uninstall: setuptools
    Found existing installation: setuptools 63.2.0
    Uninstalling setuptools-63.2.0:
      Successfully uninstalled setuptools-63.2.0
  Attempting uninstall: pip
    Found existing installation: pip 22.2.2
    Uninstalling pip-22.2.2:
      Successfully uninstalled pip-22.2.2
Successfully installed pip-22.3.1 setuptools-65.6.3

Install TensorFlow♯

Install TensorFlow 2.11.0 using pip:

(py310-tf211) [juser@gpu005 myrsrchGrp]$ python3 -m pip install tensorflow==2.11.0
Collecting tensorflow==2.11.0
  Downloading tensorflow-2.11.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (588.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 588.3/588.3 MB 3.7 MB/s eta 0:00:00
...
Installing collected packages: tensorboard-plugin-wit, pyasn1, libclang, flatbuffers, wrapt, wheel, urllib3, typing-extensions, termcolor, tensorflow-io-gcs-filesystem, tensorflow-estimator, tensorboard-data-server, six, rsa, pyparsing, pyasn1-modules, protobuf, oauthlib, numpy, MarkupSafe, markdown, keras, idna, grpcio, gast, charset-normalizer, certifi, cachetools, absl-py, werkzeug, requests, packaging, opt-einsum, h5py, google-pasta, google-auth, astunparse, requests-oauthlib, google-auth-oauthlib, tensorboard, tensorflow
Successfully installed MarkupSafe-2.1.1 absl-py-1.3.0 astunparse-1.6.3 cachetools-5.2.0 certifi-2022.9.24 charset-normalizer-2.1.1 flatbuffers-22.11.23 gast-0.4.0 google-auth-2.15.0 google-auth-oauthlib-0.4.6 google-pasta-0.2.0 grpcio-1.51.1 h5py-3.7.0 idna-3.4 keras-2.11.0 libclang-14.0.6 markdown-3.4.1 numpy-1.23.5 oauthlib-3.2.2 opt-einsum-3.3.0 packaging-21.3 protobuf-3.19.6 pyasn1-0.4.8 pyasn1-modules-0.2.8 pyparsing-3.0.9 requests-2.28.1 requests-oauthlib-1.3.1 rsa-4.9 six-1.16.0 tensorboard-2.11.0 tensorboard-data-server-0.6.1 tensorboard-plugin-wit-1.8.1 tensorflow-2.11.0 tensorflow-estimator-2.11.0 tensorflow-io-gcs-filesystem-0.28.0 termcolor-2.1.1 typing-extensions-4.4.0 urllib3-1.26.13 werkzeug-2.2.2 wheel-0.38.4 wrapt-1.14.1

Test TensorFlow♯

Run a simple one-line test to create a random 1000x1000 tensor and perform a reduce_sum():

(py310-tf211) [juser@gpu005 myrsrchGrp]$ python3 -c "import tensorflow as tf; print(tf.__version__); print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
2023-01-10 18:31:18.187484: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-10 18:31:18.290904: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2.11.0
2023-01-10 18:31:24.326718: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-10 18:31:24.840312: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1613] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30972 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:3b:00.0, compute capability: 7.0
tf.Tensor(354.97415, shape=(), dtype=float32)

The result tf.Tensor(354.97415, shape=(), dtype=float32) will be different for you since it is a random tensor.

For interactive use, remember to deactivate the venv once you are done with TensorFlow:

(py310-tf211) [juser@gpu005 myrsrchGrp]$ deactivate
[juser@gpu005 myrsrchGrp]$ which python3
[juser@gpu005 myrsrchGrp]$ /ifs/opt/python/gcc/3.10.2/bin/python3

Note that the prompt loses the "(py310-tf211)" tag.

Job scripts♯

Job scripts will need to set up the same environment before running the Python script.

Example Python script♯

Create and save this file as "test_tf.py":

#!/usr/bin/env python3
import tensorflow as tf

print(tf.__version__)
print(tf.reduce_sum(tf.random.normal([1000, 1000])))

Example job script♯

Create a job script named "tf_job.sh" to run the above TensorFlow computation in the same directory as the above test_tf.py file:

#!/bin/bash
#SBATCH --partition=gpu
#SBATCH --gpus-per-node=1
#SBATCH --cpus-per-gpu=12
#SBATCH --mem-per-gpu=40G
#SBATCH --time=0:15:00

module use /ifs/opt_cuda/modulefiles
module load python/gcc/3.10
module load cuda11.1/toolkit cuda11.1/blas cuda11.1/fft cudnn8.0-cuda11.1 tensorrt-cuda11.1/7.2.3.4

# activate TF venv
source /ifs/groups/myrsrchGrp/venvs/py310-tf211/bin/activate

python3 test_tf.py

Submit the job:

[juser@picotte001 ~]$ sbatch tf_job.sh

The output will be in a file named "slurm-NNNNNNN.out" where "NNNNNNN" is the job ID. Its contents should be something like:

2022-12-05 07:51:42.721183: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-05 07:51:42.841257: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2.11.0
2022-12-05 07:51:44.995935: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-05 07:51:45.495916: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1613] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30972 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:86:00.0, compute capability: 7.0
tf.Tensor(213.94812, shape=(), dtype=float32)

CAUTIONS♯

XLA_FLAGS environment variable♯

For some TensorFlow applications, an environment variable may need to be set:

export XLA_FLAGS="--xla_gpu_cuda_data_dir=/cm/shared/apps/cuda11.2"

Do it in your job script, before the line that runs your TF code.

Missing libdevice.10.bc♯

Despite environment variables set correctly to define the path to the CUDA Toolkit installation, TF can have trouble finding a library file libdevice.10.bc.

The workaround is to copy it to the same directory as your Python TF script:

[juser@gpu001 ~]$ cp $CUDA_DIR/nvvm/libdevice/libdevice.10.bc .

Examples♯

References♯

[1] Python 3.10 Documentation - venv

[2] Real Python - Python Virtual Environments: A Primer