cudaMemcpy2D 未处理的异常答案

【问题标题】：Unhandled exception with cudaMemcpy2DcudaMemcpy2D 未处理的异常
【发布时间】：2020-05-06 07:23:09
【问题描述】：

我是 C++ 新手（以及 Cuda 和 OpenCV），所以对于我这边的任何错误，我深表歉意。我有一个使用 Cuda 的现有代码。最近它使用 .png（已解码）作为输入，但现在我使用相机生成实时图像。这些图像是代码的新输入。这里是：

using namespace cv;

INT height = 2160;
INT width = 3840;
Mat image(height, width, CV_8UC3);
size_t pitch;
uint8_t* image_gpu;

// capture image
VideoCapture camera(0);
camera.set(CAP_PROP_FRAME_WIDTH, width);
camera.set(CAP_PROP_FRAME_HEIGHT, height);
camera.read(image);

// here I checked if image is definitly still a CV_8UC3 Mat with the initial height and width; and it is

cudaMallocPitch(&image_gpu, &pitch, width * 4, height);

// here I use cv::Mat::data to get the pointer to the data of the image:
cudaMemcpy2D(image_gpu, pitch, image.data, width*4, width*4, height, cudaMemcpyHostToDevice);

代码可以编译，但我在最后一行 (cudaMemcpy2D) 收到“抛出异常”，错误代码如下：在 realtime.exe 的 0x00007FFE838D6660 (nvcuda.dll) 处引发异常：0xC0000005：访问冲突读取位置 0x000001113AE10000。

Google 没有给我答案，我不知道从这里开始。

感谢任何提示！

【问题讨论】：

这里源音高（第4个参数）不应该是width吗？
如果您的像素类型是CV_8UC3，为什么要乘以 4，所以是 3 个通道？请确认您的总矩阵数据长度实际上是width*height*3。你也检查了cudaMallocPitch返回值吗？

标签： c++ opencv cuda

【解决方案1】：

将 OpenCV Mat 复制到使用cudaMallocPitch 分配的设备内存的一种相当通用的方法是利用Mat 对象的step 成员。此外，在分配设备内存时，您必须直观地记住设备内存将如何分配以及Mat 对象将如何复制到其中。这是一个简单的示例，演示了使用VideoCapture 捕获视频帧的过程。

#include<iostream>
#include<cuda_runtime.h>
#include<opencv2/opencv.hpp>

using std::cout;
using std::endl;

size_t getPixelBytes(int type)
{
    switch(type)
    {
        case CV_8UC1:
        case CV_8UC3:
            return sizeof(uint8_t);
            break;
        case CV_16UC1:
        case CV_16UC3:
            return sizeof(uint16_t);
            break;
        case CV_32FC1:
        case CV_32FC3:
            return sizeof(float);
            break;
        case CV_64FC1:
        case CV_64FC3:
            return sizeof(double);
            break;
        default:
            return 0;
    }
}

int main()
{
    cv::VideoCapture cap(0);
    cv::Mat frame;

    if(cap.grab())
    {
        cap.retrieve(frame);
    }
    else
    {
        cout<<"Cannot read video"<<endl;
        return -1;
    }

    uint8_t* gpu_image;
    size_t gpu_pitch;

    //Get number of bytes occupied by a single pixel. Although VideoCapture mostly returns CV_8UC3 type frame thus pixelBytes is 1 , but just in case.
    size_t pixelBytes = getPixelBytes(frame.type());

    //Number of actual data bytes occupied by a row.
    size_t frameRowBytes = frame.cols * frame.channels * pixelBytes;

    //Allocate pitch linear memory on device
    cudaMallocPitch(&gpu_image, &gpu_pitch, frameRowBytes , frame.rows);

    //Copy memory from frame to device mempry
    cudaMemcpy2D(gpu_image, gpu_pitch, frame.ptr(), frame.step, frameRowBytes, frame.rows, cudaMemcpyHostToDevice);

   //Rest of the code ...
   return 0;
}

免责声明： 代码是在浏览器中编写的。尚未测试。请根据需要添加CUDA error checking

【讨论】：