为什么 CUDA 内核无法在 VS 2013 中使用 CUDA 9.0 启动答案

【问题标题】：Why CUDA kernel does not launch in the VS 2013 with CUDA 9.0为什么 CUDA 内核无法在 VS 2013 中使用 CUDA 9.0 启动
【发布时间】：2018-05-29 22:31:27
【问题描述】：

我在 Windows (GeForce GT 720M) 中编写了一个基于 CUDA 的并行程序。我已经安装了 CUDA 9.0 Toolkit 和 Visual Studio 2013。一切都很好，但是当我编译代码并运行它时，输出是错误的。

程序是：

#include <stdio.h>
#include "cuda_runtime.h"
#include "device_launch_parameters.h"

__global__ void square(float * d_out, float * d_in)
{
    int idx = threadIdx.x;
    float f = d_in[idx];
    d_out[idx] = 50;
}

int main(int argc, char ** argv)
{
    const int ARRAY_SIZE = 64;
    const int ARRAY_BYTES = ARRAY_SIZE * sizeof(float);

    // generate the input array on the host
    float h_in[ARRAY_SIZE];
    for (int i = 0; i < ARRAY_SIZE; i++)
    {
        h_in[i] = float(i);
    }
    float h_out[ARRAY_SIZE];

    // declare GPU memory pointers
    float * d_in;
    float * d_out;

    // allocate GPU memory
    cudaMalloc((void **) &d_in, ARRAY_BYTES);
    cudaMalloc((void **) &d_out, ARRAY_BYTES);

    // transfer the array to the GPU
   cudaMemcpy(d_in, h_in, ARRAY_BYTES, cudaMemcpyHostToDevice);

    // launch the Kernel
    square << <1, ARRAY_SIZE >> >(d_out, d_in);

    // copy back the result array to the GPU
    cudaMemcpy(h_out, d_out, ARRAY_BYTES, cudaMemcpyDeviceToHost);

    // print out the resulting array
    for (int i = 0; i < ARRAY_SIZE; i++)
    {
        printf("%f", h_out[i]);
        printf(((i % 4) != 3) ? "\t" : "\n");
    }

    // free GPU memory allocation
    cudaFree(d_in);
    cudaFree(d_out);

    getchar();
    return 0;
}

当我运行它时，输出是：

另外，我用nvcc square.cu 编译它，但输出是一样的。我在VS中有内核启动语法错误，但我认为它与输出无关（但图像与另一个程序有关）：

【问题讨论】：

首先将proper cuda error checking 添加到您的代码中。然后重新编译/运行并报告任何指示的错误。
@RobertCrovella 我应该使用哪个答案？
@RobertCrovella CUDA 错误：没有可在设备上执行的内核映像
所以没有为将在您的 GPU 上运行的架构编译您的架构
GeForce GT 720M 是GF117 (Fermi-based) GPU。 CUDA 9 放弃了对 Fermi 设备的支持。 CUDA 8 是仍然支持 Fermi GPU 的最新 CUDA 工具包。

标签： windows visual-studio parallel-processing cuda

【解决方案1】：

问题出在 CUDA Toolkit 版本上。对于 GeForce GT 720M，Compute Capability 为 2.1，可供CUDA 8.0 使用。

【讨论】：

【解决方案2】：

这是 CUDA 工具包版本及其计算能力的表格。

See section GPUs supported

【讨论】：

不敢相信在 2020 年安装不检查你的硬件给你一个警告它不支持你的硬件。这是香蕉。我花了几个小时把头撞到墙上，因为我可以将数据从主机复制到设备，反之亦然，但无法调用 global 函数。