为 CUDA 编译自定义 tensorflow op答案

【问题标题】：Compile custom tensorflow op for CUDA为 CUDA 编译自定义 tensorflow op
【发布时间】：2017-11-02 07:47:31
【问题描述】：

我正在按照the tensorflow documentation 中的指南为需要 GPU 支持的 tensorflow 开发自定义 OP。在我自己的代码中跟踪错误时，我回到文档中的示例并尝试编译the referenced code example：

#if GOOGLE_CUDA
#define EIGEN_USE_GPU
#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"

__global__ void AddOneKernel(const int* in, const int N, int* out) {
  for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < N;
       i += blockDim.x * gridDim.x) {
    out[i] = in[i] + 1;
  }
}

void AddOneKernelLauncher(const int* in, const int N, int* out) {
  AddOneKernel<<<32, 256>>>(in, N, out);
}

#endif

使用文档中建议的命令：

nvcc -std=c++11 -c -o cuda_op_kernel.cu.o cuda_op_kernel.cu.cc \
-I $TF_INC -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC

$TF_INC 被 tensorflow 包含路径正确替换。不幸的是，这会产生很多错误：

/usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h(1294): error: expression must have arithmetic, unscoped enum, or pointer type
/usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h(1300): error: expression must have arithmetic, unscoped enum, or pointer type
/usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h(1306): error: expression must have arithmetic, unscoped enum, or pointer type
/usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h(1312): error: expression must have arithmetic, unscoped enum, or pointer type
/usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h(1318): error: expression must have arithmetic, unscoped enum, or pointer type
/usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h(1324): error: expression must have arithmetic, unscoped enum, or pointer type
/usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h(1330): error: expression must have arithmetic, unscoped enum, or pointer type
/usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h(1336): error: expression must have arithmetic, unscoped enum, or pointer type

还有更多类似的。

我发现这可能与不受支持的 nvcc / gcc / os 组合有关。我没有自己设置机器（实际上没有 sudo 权限）。我在 Ubuntu 16.04.2 上有 nvcc 版本 7.5.17、gcc 版本 4.9.3。 CUDA 7.5 支持的系统中未列出 Ubuntu 16.04.2。这可能是一个问题，但我发现很多人声称它适用于 16.04。此外，我在这台机器上成功编译了支持 GPU 的 Tensorflow..

此外，这些错误与代码中的the Tensor #include 有关，并且代码在没有此行的情况下编译成功。我还没有尝试过演示 OP 是否可以在没有此包含的情况下工作，但我自己的 OP 失败了

2017-06-01 09:36:14.679685: E tensorflow/stream_executor/cuda/cuda_driver.cc:1067] could not synchronize on CUDA context: CUDA_ERROR_LAUNCH_FAILED :: No stack trace available
2017-06-01 09:36:14.679777: F tensorflow/core/common_runtime/gpu/gpu_util.cc:370] GPU sync failed

两个问题：

为什么我需要包含这个 Eigen Tensor 标头，尽管演示 OP 实际上不使用 Eigen Tensor？
错误来自哪里以及如何解决？您认为这与不受支持的系统配置有关吗？

【问题讨论】：

错误清楚地表明您正在使用带有 nvcc 的 gcc 5 来编译代码。这在 CUDA 7.5 中不受支持。
哦，不好意思，真的没看到，还以为nvcc使用的是系统默认的gcc，也就是4.9。
据我所知，Ubuntu 16 版本中的系统默认编译器是 gcc 5.3 快照
嗯.. 对我来说gcc --version 给了4.9.3。也许是一个错误的符号链接..

标签： c++ ubuntu cuda nvcc tensorflow-gpu

【解决方案1】：

好的，对于那些遇到同样问题的人：您可以使用-ccbin 参数为nvcc 设置主机编译器，正如this 答案中所指出的那样。只需将其设置为gcc-4.9。

【讨论】：

请记住几天后再回来接受这个答案，这样它就会从 CUDA 标签的未回答队列中掉下来-