模板 __host__ __device__ 调用主机定义的函数答案

【问题标题】：Template __host__ __device__ calling host defined functions模板 __host__ __device__ 调用主机定义的函数
【发布时间】：2015-07-13 18:48:33
【问题描述】：

在执行 CUDA 代码期间，我经常需要一些实用程序函数，这些函数可以从设备调用，也可以从主机代码调用。所以我将这些函数声明为 __host__ __device__。这没关系，#ifdef CUDA_ARCH 可以处理可能的设备/主机不兼容问题。

当效用函数被模板化时，问题就来了。通过某种函子类型。如果模板实例调用 __host__ 函数，我会收到以下警告：

calling a __host__ function from a __host__ __device__ function is not allowed
      detected during instantiation of "int foo(const T &) [with T=HostObject]"

我知道的唯一解决方案是定义函数两次 - 一次用于设备，一次用于具有不同名称的主机代码（我不能在 __host__ __device__ 上重载）。但这意味着存在代码重复，并且所有其他将调用它的 __host__ __device__ 函数也必须定义两次（甚至更多代码重复）。

简化示例：

#include <cuda.h>
#include <iostream>

struct HostObject {
    __host__ 
    int value() const { return 42; }
};

struct DeviceObject {
    __device__ 
    int value() const { return 3; }
};

template <typename T> 
__host__ __device__ 
int foo(const T &obj) {
    return obj.value();
}

/*
template <typename T> 
__host__ 
int foo_host(const T &obj) {
    return obj.value();
}

template <typename T> 
__device__ 
int foo_device(const T &obj) {
    return obj.value();
}
*/

__global__ void kernel(int *data) {
    data[threadIdx.x] = foo(DeviceObject());
}

int main() {
    foo(HostObject());

    int *data;
    cudaMalloc((void**)&data, sizeof(int) * 64);
    kernel<<<1, 64>>>(data);
    cudaThreadSynchronize();
    cudaFree(data);
}

警告是由main() 函数内的foo(HostObject()); 调用引起的。

foo_host<> 和 foo_device<> 可以替代有问题的foo<>。

有没有更好的解决方案？我可以在设备端阻止foo() 的实例化吗？

【问题讨论】：

foo() 内部没有调用构造函数。问题正是警告所说的。我在问是否可以在不定义通用函数两次的情况下以某种方式修复它。
警告是由主函数中的foo(HostObject())引起的。构造函数没有问题，因为在我自己声明之前，都会自动生成构造函数（由主机和设备编译器）。
抱歉，现在我明白你的意思了——如果没有编译器，就很难看到错误的显示位置。因此，我相信在您的问题中提及它会很有用。
我猜这只是一个编译器问题。 Thrust 过度使用模板并使用#pragma hd_warning_disable 禁用此类警告，请参阅github.com/thrust/thrust/blob/master/thrust/detail/config/…
想象一下 HostObject 不在你的控制之下——它可能来自不同的库或模块，所以你不能让它启用 CUDA。这也意味着 CUDA 编译器将处理该函数，如果 CUDA 编译器无法编译主体，则会产生错误消息。

标签： cuda

【解决方案1】：

您无法阻止 __host__ __device__ 函数模板实例化的任何一半实例化。如果通过在主机（设备）上调用函数来实例化函数，编译器也会实例化设备（主机）的一半。

从 CUDA 7.0 开始，您可以为您的用例做的最好的事情是使用 #pragma hd_warning_disable 来抑制警告，如下例所示，并确保不会错误地调用该函数。

#include <iostream>
#include <cstdio>

#pragma hd_warning_disable
template<class Function>
__host__ __device__
void invoke(Function f)
{
  f();
}

struct host_only
{
  __host__
  void operator()()
  {
    std::cout << "host_only()" << std::endl;
  }
};

struct device_only
{
  __device__
  void operator()()
  {
    printf("device_only(): thread %d\n", threadIdx.x);
  }
};

__global__
void kernel()
{
  // use from device with device functor
  invoke(device_only());

  // XXX error
  // invoke(host_only());
}

int main()
{
  // use from host with host functor
  invoke(host_only());

  kernel<<<1,1>>>();
  cudaDeviceSynchronize();

  // XXX error
  // invoke(device_only());

  return 0;
}

【讨论】：

您知道#pragma hd_warning_disable 或#pragma nv_exec_check_disable 是否记录在任何地方？