【问题标题】:row-major or column-major access of thread index in cuda?cuda中线程索引的行主要访问还是列主要访问?
【发布时间】:2017-05-21 15:20:45
【问题描述】:

我很困惑图像是以行优先还是列优先顺序存储在设备的全局内存中。 在两个订单中访问图像时,我得到了两个不同的图像输出。
以行优先顺序访问时-

int x = threadIdx.x + blockDim.x * blockIdx.x;
int y = threadIdx.y + blockDim.y * blockIdx.y;

int m = numCols * y + x;

if (x >= numCols || y >= numRows)
    return;

//marking column boundaries
if (x <= 2){                    
    d_Image[m].x = 255;
    d_Image[m].y = 0;
    d_Image[m].z = 0;
}
else if (x >= numCols-2){
    d_Image[m].x = 0;
    d_Image[m].y = 0;
    d_Image[m].z = 255;
}
else{
    d_Image[m].x = d_sample[m].x;
    d_Image[m].y = d_sample[m].y;
    d_Image[m].z = d_sample[m].z;
}
d_Image[m].w = d_sample[m].w;

output 使用行优先
以列优先顺序访问时-

int m = x * numRows + y;

output 使用 col-major
Dimensions-

const dim3 blockSize(16,16);
const dim3 gridSize(numCols/16+1, numRows/16+1, 1);
blur << < gridSize, blockSize >> >(d_Image, d_sample, numRows, numCols);

我正在使用 opencv 加载和保存图像。
在第一个输出中,红点和蓝点散布在整个图像中。在第二个输出(col-major)中,边界行被标记,而我试图标记列。我太困惑了。 编辑

void helper(uchar4* d_sample, uchar4* d_Image, size_t numRows, size_t numCols);

cv::Mat sample;
cv::Mat Image;

size_t numRows() { return sample.rows; }
size_t numCols() { return sample.cols; }

__global__ void blur(const uchar4 *d_sample, uchar4* d_Image, size_t numRows, size_t numCols){

  int x = threadIdx.x + blockDim.x * blockIdx.x;
  int y = threadIdx.y + blockDim.y * blockIdx.y;

  int m = y*numCols + x;                    

  if (x >= numCols || y >= numRows)
        return;

  if (x <= 2){
      d_Image[m].x = 255;
      d_Image[m].y = 0;
      d_Image[m].z = 0;
  }
  else if (x >= (numCols-2)){
      d_Image[m].x = 0;
      d_Image[m].y = 0;
      d_Image[m].z = 255;
  }
  else{
      d_Image[m].x = d_sample[m].x;
      d_Image[m].y = d_sample[m].y;
      d_Image[m].z = d_sample[m].z;
  }
  d_Image[m].w = d_sample[m].w;
  }

int main(){

  uchar4  *h_sample, *d_sample, *d_Image, *h_Image;
  int filter[9];
  sample = cv::imread("sample.jpg", CV_LOAD_IMAGE_COLOR);
  if (sample.empty()){
        std::cout << "error in loading image.";
        system("pause");
  }

  cv::cvtColor(sample,sample,CV_BGR2RGBA);
  Image.create(numRows(), numCols(), CV_8UC4);

  if (!sample.isContinuous() || !Image.isContinuous()) {
      std::cerr << "Images aren't continuous!! Exiting." << std::endl;
      system("pause");
      exit(1);
  }
  cv::cvtColor(Image,Image,CV_BGR2RGBA);

  h_sample = (uchar4*)sample.data;
  h_Image = (uchar4*)Image.data;

  size_t numPixels = numRows() * numCols();

    //allocate mmeory on device
  checkCudaErrors(cudaMalloc((void**)&d_sample, sizeof(uchar4) * numPixels));
  checkCudaErrors(cudaMalloc((void**)&d_Image, sizeof(uchar4) * numPixels));

  checkCudaErrors(cudaMemset(d_sample, 0, sizeof(uchar4) * numPixels));
  checkCudaErrors(cudaMemset(d_Image, 0, sizeof(uchar4) * numPixels));

//copy to device
  checkCudaErrors(cudaMemcpy(d_sample, h_sample, sizeof(uchar4) * numPixels, cudaMemcpyHostToDevice));

  helper(d_sample, d_Image, numCols(), numRows());

//copy back to  host
  checkCudaErrors(cudaMemcpy(h_Image, d_Image, sizeof(uchar4) * numPixels, cudaMemcpyDeviceToHost));

  cv::cvtColor(Image,Image,CV_RGBA2BGR);

  cv::namedWindow("Image", CV_WINDOW_AUTOSIZE);
  cv::imshow("Image", Image);
  cv::waitKey(0);
  cv::imwrite("sample.jpg", Image);

  return 0;
}

void helper(uchar4* d_sample, uchar4* d_Image, size_t numRows, size_t numCols){

  const dim3 blockSize(16,16);
  const dim3 gridSize(numCols/16+1, numRows/16+1, 1);
  blur << < gridSize, blockSize >> >(d_sample, d_Image, numRows, numCols);
  cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
}

【问题讨论】:

  • 您确定您正在正确查看生成的内存/图像吗?例如,您是否可能使用列专业保存/显示图像,而其他所有内容都是行专业?
  • opencv 以行为主存储 mat/image 数据,如果有帮助的话...
  • AFAIK cv::cuda::GpuMat 下的设备分配是一个倾斜的分配。我不知道这是否是您正在使用的,因为您没有显示完整的代码(询问“为什么此代码不起作用?”的问题预计包括minimal reproducible example)。但是您发布的代码似乎没有考虑到倾斜的分配。请参阅幻灯片 15 here
  • @RobertCrovella 请纠正我。我已经上传了完整的代码。
  • @Dharmendar:那会是什么“完整代码”?

标签: c++ image opencv cuda


【解决方案1】:
void helper(uchar4* d_sample, uchar4* d_Image, size_t numRows, size_t numCols){

然后你打电话

helper(d_sample, d_Image, numCols(), numRows());

我想你在调用 helper 时可能已经切换了列和行...

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-06-09
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-12-04
    • 1970-01-01
    相关资源
    最近更新 更多