尝试进行简单的赋值时内核崩溃答案

【问题标题】：Kernel crashes while trying to do a simple value assignment尝试进行简单的赋值时内核崩溃
【发布时间】：2018-01-16 16:09:32
【问题描述】：

我正在学习 CUDA，但仍处于初级水平。我正在尝试一个简单的任务，但是当我运行它时我的代码崩溃了，我不知道为什么。任何帮助将不胜感激。

编辑：在cudaMemcpy 和Image 结构中崩溃，pixelVal 的类型为int**。是这个原因吗？

原始C++代码：

void Image::reflectImage(bool flag, Image& oldImage)
/*Reflects the Image based on users input*/
{
    int rows = oldImage.N;
    int cols = oldImage.M;
    Image tempImage(oldImage);

    for(int i = 0; i < rows; i++)
    {
        for(int j = 0; j < cols; j++)
        tempImage.pixelVal[rows - (i + 1)][j] = oldImage.pixelVal[i][j];
    }
    oldImage = tempImage;
}

我的 CUDA 内核和代码：

#define NTPB 512
__global__ void fliph(int* a, int* b, int r, int c)
{
    int i = blockIdx.x * blockDim.x + threadIdx.x;
    int j = blockIdx.y * blockDim.y + threadIdx.y;

    if (i >= r || j >= c)
        return;
    a[(r - i * c) + j] = b[i * c + j];
}
void Image::reflectImage(bool flag, Image& oldImage)
/*Reflects the Image based on users input*/
{
    int rows = oldImage.N;
    int cols = oldImage.M;
    Image tempImage(oldImage);
    if(flag == true) //horizontal reflection
    {
     //Allocate device memory
     int* dpixels;
     int* oldPixels;
     int n = rows * cols;
     cudaMalloc((void**)&dpixels, n * sizeof(int));
     cudaMalloc((void**)&oldPixels, n * sizeof(int));
     cudaMemcpy(dpixels, tempImage.pixelVal, n * sizeof(int), cudaMemcpyHostToDevice);
     cudaMemcpy(oldPixels, oldImage.pixelVal, n * sizeof(int), cudaMemcpyHostToDevice);
     int nblks = (n + NTPB - 1) / NTPB;
     fliph<<<nblks, NTPB>>>(dpixels, oldPixels, rows, cols);
     cudaMemcpy(tempImage.pixelVal, dpixels, n * sizeof(int), cudaMemcpyDeviceToHost);
     cudaFree(dpixels);
     cudaFree(oldPixels);
    }
    oldImage = tempImage;
}

【问题讨论】：

你的块和网格是一维的。你为什么在内核中使用二维索引。变量j 在内核中总是为0。
通过快速审查，代码看起来没有问题（@sgar91 注释除外）。我建议您为您的程序提供错误检查，以进一步说明您的问题。看at这个帖子。
我数了 7 次 CUDA API 调用，根本看不到任何错误检查！第一步：检查错误并尝试缩小问题的根源。
@BhrugeshPatel：你说它在memcpy 上崩溃了。但是该代码中没有 memcpy 调用。你的意思是cudaMemcpy？其中有三个。哪一个？细节在这里很重要。帮助我们帮助你......
@talonmies 是的，我指的是 cudaMemcpy。它在第一个 cudaMemcpy 上崩溃。 cudaMemcpy(dpixels, tempImage.pixelVal, n * sizeof(int), cudaMemcpyHostToDevice);

标签： parallel-processing cuda

【解决方案1】：

您必须创建一个 2D 网格才能使用 2D 索引 i 和 j 处理图像。在当前情况下，内核只处理图像的第一行。

要创建 2D 索引机制，请像这样创建 2D 块和 2D 网格：

const int BLOCK_DIM = 16;

dim3 Block(BLOCK_DIM,BLOCK_DIM);

dim3 Grid;
Grid.x = (cols + Block.x - 1)/Block.x;
Grid.y = (rows + Block.y - 1)/Block.y;

fliph<<<Grid, Block>>>(dpixels, oldPixels, rows, cols);

【讨论】：