Not possible to build TensorFlow 1.15.2 with CUDA 10.0.130
Our initial aim was to replicate the NLPL modules with EasyBuild. This implied using TensorFlow 1.15.2 with CUDA 10.0.
However, I ran into issues building TF with the
CUDA/10.0.130 module provided by Saga.
The problem is that this version of CUDA stores header files in a non-standard location (
/cluster/software/CUDA/10.0.130/include). For all more recent CUDA version, this location is
/cluster/software/CUDA/VERSION/targets/x86_64-linux/include . The EB EasyBlock for TensorFlow uses this standard location to look for CUDA headers. With the
CUDA/10.0.130 module, it can't find the headers, and fails with "Failed to isolate path to cublas_api.h". If I use
CUDA/10.1.243, building of TF works like a charm, without any trouble.
I have not yet figured out whether there is some issue with Saga installation of CUDA 10.0.130, or is it the expected behavior, and EB simply does not support older versions of CUDA.
My question to @sabryr is: have you encountered this problem?
My question to @oe is: should we continue to struggle with CUDA 10.0.130, or is it OK to move on with CUDA 10.1.243?