Installing TensorFlow 2.11.0 using pip and venv

We follow the official instructions for installation via pip, except we use a pre-installed Python via the modulefile python/gcc/3.10, and we use Python virtual environments (venv)[1][2] instead of miniconda (or Anaconda).

N.B. TensorFlow from pip supports CPU-only and GPUs. There is no need to install both "tensorflow" and "tensorflow-gpu" packages: they are identical.

Requirements

Listed requirements (and check support matrix for each):

CUDA driver >= 450.80.02
CUDA Toolkit 11.2
CuDNN 8.1.0
Optional: TensorRT to improve latency and throughput for inference

N.B. Using the above listed requirements will result in warning messages about not being able to find certain library (shared object) files:

2022-12-05 07:15:48.158455: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvrtc.so.11.1: cannot open shared object file: No such file or directory; ...

~~Actual requirements because TF pip package from PyPI links to CUDA 11.1:~~

CUDA driver >= 450.80.02

CUDA Toolkit 11.1

CuDNN 8.x

~~TensorRT 7.2~~

UPDATE 2023-01-10 Looks like they fixed the TF PyPI packages.

Interactive session on GPU node

Run an interactive shell on a GPU node:

[juser@picotte001 ~]$ srun -p gpu --gpus-per-node=1 --mem-per-gpu=16G --cpus-per-gpu=12 --time=2:00:00 --pty /bin/bash
[juser@gpu005 ~]$

Check number of GPUs assigned: should see only one, with id number 0:

[juser@gpu005 ~]$ nvidia-smi
Mon Feb 27 17:15:18 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.39.01    Driver Version: 510.39.01    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:18:00.0 Off |                    0 |
| N/A   37C    P0    41W / 300W |      0MiB / 32768MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Load requirements

Set up environment and load appropriate modulefiles:

[juser@gpu005 ~]$ module use /ifs/opt_cuda/modulefiles
[juser@gpu005 ~]$ module load python/gcc/3.10
[juser@gpu005 ~]$ module load cuda11.2/toolkit cuda11.2/blas cuda11.2/fft tensorrt-cuda11.2/7.2.3.4 cudnn8.7-cuda11.2 cutensor-cuda11.2

Set up Python virtual environment (venv)

[juser@gpu005 ~]$ cd /ifs/groups/myrsrchGrp
[juser@gpu005 myrsrchGrp]$ mkdir venvs
[juser@gpu005 myrsrchGrp]$ python3 -m venv ./venvs/py310-tf211
[juser@gpu005 myrsrchGrp]$ source ./venvs/py310-tf211/bin/activate
(py310-tf211) [juser@gpu005 myrsrchGrp]$

Note the change of prompt: the venv name "(py310-tf211)" is added.

Next, check that the venv is active by looking at the location of the python3 executable.

(py310-tf211) [juser@gpu005 myrsrchGrp]$ which python3
/ifs/groups/myrsrchGrp/venvs/py310-tf211/bin/python3

Update pip and setuptools (because there is a critical setuptools security fix):

(py310-tf211) [juser@gpu005 myrsrchGrp]$ python3 -m pip install -U pip setuptools
Requirement already satisfied: pip in ./venvs/py310-tf211/lib/python3.10/site-packages (22.2.2)
Collecting pip
  Using cached pip-22.3.1-py3-none-any.whl (2.1 MB)
Requirement already satisfied: setuptools in ./venvs/py310-tf211/lib/python3.10/site-packages (63.2.0)
Collecting setuptools
  Using cached setuptools-65.6.3-py3-none-any.whl (1.2 MB)
Installing collected packages: setuptools, pip
  Attempting uninstall: setuptools
    Found existing installation: setuptools 63.2.0
    Uninstalling setuptools-63.2.0:
      Successfully uninstalled setuptools-63.2.0
  Attempting uninstall: pip
    Found existing installation: pip 22.2.2
    Uninstalling pip-22.2.2:
      Successfully uninstalled pip-22.2.2
Successfully installed pip-22.3.1 setuptools-65.6.3

Install TensorFlow

Install TensorFlow 2.11.0 using pip:

(py310-tf211) [juser@gpu005 myrsrchGrp]$ python3 -m pip install tensorflow==2.11.0
Collecting tensorflow==2.11.0
  Downloading tensorflow-2.11.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (588.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 588.3/588.3 MB 3.7 MB/s eta 0:00:00
...
Installing collected packages: tensorboard-plugin-wit, pyasn1, libclang, flatbuffers, wrapt, wheel, urllib3, typing-extensions, termcolor, tensorflow-io-gcs-filesystem, tensorflow-estimator, tensorboard-data-server, six, rsa, pyparsing, pyasn1-modules, protobuf, oauthlib, numpy, MarkupSafe, markdown, keras, idna, grpcio, gast, charset-normalizer, certifi, cachetools, absl-py, werkzeug, requests, packaging, opt-einsum, h5py, google-pasta, google-auth, astunparse, requests-oauthlib, google-auth-oauthlib, tensorboard, tensorflow
Successfully installed MarkupSafe-2.1.1 absl-py-1.3.0 astunparse-1.6.3 cachetools-5.2.0 certifi-2022.9.24 charset-normalizer-2.1.1 flatbuffers-22.11.23 gast-0.4.0 google-auth-2.15.0 google-auth-oauthlib-0.4.6 google-pasta-0.2.0 grpcio-1.51.1 h5py-3.7.0 idna-3.4 keras-2.11.0 libclang-14.0.6 markdown-3.4.1 numpy-1.23.5 oauthlib-3.2.2 opt-einsum-3.3.0 packaging-21.3 protobuf-3.19.6 pyasn1-0.4.8 pyasn1-modules-0.2.8 pyparsing-3.0.9 requests-2.28.1 requests-oauthlib-1.3.1 rsa-4.9 six-1.16.0 tensorboard-2.11.0 tensorboard-data-server-0.6.1 tensorboard-plugin-wit-1.8.1 tensorflow-2.11.0 tensorflow-estimator-2.11.0 tensorflow-io-gcs-filesystem-0.28.0 termcolor-2.1.1 typing-extensions-4.4.0 urllib3-1.26.13 werkzeug-2.2.2 wheel-0.38.4 wrapt-1.14.1

Test TensorFlow

Run a simple one-line test to create a random 1000x1000 tensor and perform a reduce_sum():

(py310-tf211) [juser@gpu005 myrsrchGrp]$ python3 -c "import tensorflow as tf; print(tf.__version__); print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
2023-01-10 18:31:18.187484: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-10 18:31:18.290904: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2.11.0
2023-01-10 18:31:24.326718: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-10 18:31:24.840312: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1613] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30972 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:3b:00.0, compute capability: 7.0
tf.Tensor(354.97415, shape=(), dtype=float32)

The result tf.Tensor(354.97415, shape=(), dtype=float32) will be different for you since it is a random tensor.

For interactive use, remember to deactivate the venv once you are done with TensorFlow:

(py310-tf211) [juser@gpu005 myrsrchGrp]$ deactivate
[juser@gpu005 myrsrchGrp]$ which python3
[juser@gpu005 myrsrchGrp]$ /ifs/opt/python/gcc/3.10.2/bin/python3

Note that the prompt loses the "(py310-tf211)" tag.

Job scripts

Job scripts will need to set up the same environment before running the Python script.

Example Python script

Create and save this file as "test_tf.py":

#!/usr/bin/env python3
import tensorflow as tf

print(tf.__version__)
print(tf.reduce_sum(tf.random.normal([1000, 1000])))

Example job script

Create a job script named "tf_job.sh" to run the above TensorFlow computation in the same directory as the above test_tf.py file:

#!/bin/bash
#SBATCH --partition=gpu
#SBATCH --gpus-per-node=1
#SBATCH --cpus-per-gpu=12
#SBATCH --mem-per-gpu=40G
#SBATCH --time=0:15:00

module use /ifs/opt_cuda/modulefiles
module load python/gcc/3.10
module load cuda11.1/toolkit cuda11.1/blas cuda11.1/fft cudnn8.0-cuda11.1 tensorrt-cuda11.1/7.2.3.4

# activate TF venv
source /ifs/groups/myrsrchGrp/venvs/py310-tf211/bin/activate

python3 test_tf.py

Submit the job:

[juser@picotte001 ~]$ sbatch tf_job.sh

The output will be in a file named "slurm-NNNNNNN.out" where "NNNNNNN" is the job ID. Its contents should be something like:

2022-12-05 07:51:42.721183: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-05 07:51:42.841257: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2.11.0
2022-12-05 07:51:44.995935: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-05 07:51:45.495916: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1613] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30972 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:86:00.0, compute capability: 7.0
tf.Tensor(213.94812, shape=(), dtype=float32)

CAUTIONS

XLA_FLAGS environment variable

For some TensorFlow applications, an environment variable may need to be set:

export XLA_FLAGS="--xla_gpu_cuda_data_dir=/cm/shared/apps/cuda11.2"

Do it in your job script, before the line that runs your TF code.

Missing libdevice.10.bc

Despite environment variables set correctly to define the path to the CUDA Toolkit installation, TF can have trouble finding a library file libdevice.10.bc.

The workaround is to copy it to the same directory as your Python TF script:

[juser@gpu001 ~]$ cp $CUDA_DIR/nvvm/libdevice/libdevice.10.bc .

Examples

References

[1] Python 3.10 Documentation - venv

[2] Real Python - Python Virtual Environments: A Primer