CUDA：重新分配内存时出现无效的设备指针错误答案

【问题标题】：CUDA: Invalid Device Pointer error when reallocating memoryCUDA：重新分配内存时出现无效的设备指针错误
【发布时间】：2016-10-03 04:46:22
【问题描述】：

在下面的代码中，我只是从 main 连续两次调用函数 foo。该函数只是简单地进行设备内存分配，然后递增这个指针。然后它退出并返回主程序。

第一次调用 foo 时，内存已正确分配。但是现在，当我再次调用 foo 时，您可以在输出中看到，cuda 内存分配失败并出现错误 invalid device pointer

我尝试在两个 foo 调用之间使用 cudaThreadSynchronize()，但没有任何收获。为什么内存分配失败？

其实这个错误是因为

矩阵 += 3;

因为如果我不这样做，错误就会消失。
但是为什么，即使我使用的是 cudaFree() ？

请帮助我理解这一点。

我的输出在这里

Calling foo for the first time
Allocation of matrixd passed:
I came back to main safely :-)
I am going back to foo again :-)
Allocation of matrixd failed, the reason is:  invalid device pointer

我的 main() 在这里

#include<stdio.h>  
#include <cstdlib> // malloc(), free() 
#include <iostream> // cout, stream
#include <math.h>
#include <ctime> // time(), clock()
#include <bitset>
bool foo(  );

/***************************************
Main method.

****************************************/
 int main()  
 { 

    // Perform one warm-up pass and validate
    std::cout << "Calling foo for the first time"<<std::endl;
    foo();
    std::cout << "I came back to main safely :-) "<<std::endl;
    std::cout << "I am going back to foo again :-) "<<std::endl;
    foo( );    
    getchar();  
    return 0;  
 }

foo() 的定义在这个文件中：

#include <cuda.h>
#include <cuda_runtime_api.h>
#include <device_launch_parameters.h>
#include <iostream>

bool foo( )
{
    // Error return value
    cudaError_t status;
    // Number of bytes in the matrix.
    int bytes = 9 *sizeof(float);
        // Pointers to the device arrays
    float *matrixd=NULL; 

    // Allocate memory on the device to store matrix
    cudaMalloc((void**) &matrixd, bytes);
    status = cudaGetLastError();              //To check the error
    if (status != cudaSuccess) {                     
        std::cout << "Allocation of matrixd failed, the reason is:  " <<    cudaGetErrorString(status) << 
        std::endl;
        cudaFree(matrixd);                     //Free call for memory
        return false;
    }

    std::cout << "Allocation of matrixd passed: "<<std::endl;


    ////// Increment address 
    for (int i=0; i<3; i++){
         matrixd += 3;
    }

        // Free device memory
    cudaFree(matrixd);     

    return true;
}

更新

具有更好的错误检查。此外，我对设备指针只有一次增量主义。这次我得到以下输出：

Calling foo for the first time
Allocation of matrixd passed:
Increamented the pointer and going to free cuda memory:
GPUassert: invalid device pointer C:/Users/user/Desktop/Gauss/Gauss/GaussianElem
inationGPU.cu 44

第 44 行是 cudaFree()。为什么还是失败？

#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, const char *file, int line, bool abort=true)
{
   if (code != cudaSuccess) 
   {
      fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
      if (abort) exit(code);
   }
}

// GPU function for direct method Gross Jorden method.

bool foo( )
{

    // Error return value
    cudaError_t status;
    // Number of bytes in the matrix.
    int bytes = 9 *sizeof(float);
        // Pointers to the device arrays
    float *matrixd=NULL; 

    // Allocate memory on the device to store each matrix
    gpuErrchk( cudaMalloc((void**) &matrixd, bytes));
    //cudaMemset(outputMatrixd, 0, bytes);

    std::cout << "Allocation of matrixd passed: "<<std::endl;


    ////// Incerament address 

         matrixd += 1;

         std::cout << "Increamented the pointer and going to free cuda memory: "<<std::endl;

         // Free device memory
    gpuErrchk( cudaFree(matrixd));     

    return true;
}

【问题讨论】：

如果检查cudaFree调用的返回状态会发生什么？
@talonmies 你是对的，刚刚检查过，我在 cudafree 下面使用了 cudagetlasterror()，是的，它显示了，它失败了但是为什么？
对。所以你的问题基本上是由不完整的错误检查引起的。你可以看到如何正确地做到这一点here。内存分配没有失败。
我会在链接中查看你的答案，但是你确定解除分配没有错误（cuadgetlasterror报错）？

标签： cuda

【解决方案1】：

真正的问题在于这段代码：

for (int i=0; i<3; i++){
     matrixd += 3;
}

// Free device memory
cudaFree(matrixd);

您从未分配过matrixd+9，因此将其传递给cudaFree 是非法的，并且会产生无效设备指针错误。此错误将传播到您下次执行错误检查时，即在随后调用cudaMalloc 之后。如果您阅读任何这些 API 调用的文档，您会注意到有一条警告，它们可能会从先前的 GPU 操作中返回错误。这就是本例中发生的情况。

CUDA 运行时 API 中的错误检查可能很微妙，无法正确执行。有一个强大的、现成的方法来说明如何做到这一点here。我建议你使用它。

【讨论】：

您的错误检查方式非常简洁。请看我的更新。我认为我的错误是我试图在主机函数中增加设备指针。我想这是不允许的，并且 cuda free 对此并不满意。事实上，主机函数中的矩阵++将指向主机中的一些垃圾而不是设备内存中的一些垃圾..
@user3891236：我已经告诉你问题出在哪里了。您无法释放未分配的地址。 “增加”指针是完全可以的（尽管在这种情况下完全没有意义）。但是要求 API 释放递增的指针是非法的，因为 API 从未在该指针值处分配内存。
非常感谢您为我解惑。我今天从你那里学到了很多东西，包括检查 CUDA 错误的重要性！。