【发布时间】:2014-12-21 13:56:34
【问题描述】:
我正在使用 OpenCL(矩阵乘法的一种变体)将工作卸载到 GPU。矩阵代码本身工作得非常好,但是将数据移动到 GPU 的成本是令人望而却步的。
我已从使用 clEnqueueRead/clEnqueueWrite 转移到内存映射缓冲区,如下所示:
d_a = clCreateBuffer(context, CL_MEM_READ_ONLY|CL_MEM_ALLOC_HOST_PTR,
sizeof(char) * queryVector_size,
NULL, NULL);
checkErr(err,"Buf A");
d_b = clCreateBuffer(context, CL_MEM_READ_ONLY|CL_MEM_ALLOC_HOST_PTR,
sizeof(char) * segment_size,
NULL, NULL);
checkErr(err,"Buf B");
err = clSetKernelArg(ko_smat, 0, sizeof(cl_mem), &d_c);
checkErr(err,"Compute Kernel");
err = clSetKernelArg(ko_smat, 1, sizeof(cl_mem), &d_a);
checkErr(err,"Compute Kernel");
err = clSetKernelArg(ko_smat, 2, sizeof(cl_mem), &d_b);
checkErr(err,"Compute Kernel");
query_vector = (char*) clEnqueueMapBuffer(commands, d_a, CL_TRUE,CL_MAP_READ, 0, sizeof(char) * queryVector_size, 0, NULL, NULL, &err);
checkErr(err,"Write A");
segment_data = (char*) clEnqueueMapBuffer(commands, d_b, CL_TRUE,CL_MAP_READ, 0, sizeof(char) * segment_size, 0, NULL, NULL, &err);
checkErr(err,"Write B");
// code which initialises buffers using ptrs (segment_data and queryV)
err = clEnqueueUnmapMemObject(commands,
d_a,
query_vector, 0, NULL, NULL);
checkErr(err,"Unmap Buffer");
err = clEnqueueUnmapMemObject(commands,
d_b,
segment_data, 0, NULL, NULL);
checkErr(err,"Unmap Buff");
err = clEnqueueNDRangeKernel(commands, ko_smat, 2, NULL, globalWorkItems, localWorkItems, 0, NULL, NULL);
err = clFinish(commands);
checkErr(err, "Execute Kernel");
result = (char*) clEnqueueMapBuffer(commands, d_c, CL_TRUE,CL_MAP_WRITE, 0, sizeof(char) * result_size, 0, NULL, NULL, &err);
checkErr(err,"Write C");
printMatrix(result, result_row, result_col);
当我使用 ReadEnqueue/WriteEnqueue 方法并通过它初始化 d_a、d_b、d_c 时,此代码工作正常,但是当我使用 MappedBuffers 时,由于 d_a 和 d_b 为空,结果为 0 运行内核时。
映射/取消映射缓冲区的适当方法是什么?
编辑: 核心问题似乎来自这里
segment_data = (char*) clEnqueueMapBuffer(commands, d_b, CL_TRUE,CL_MAP_READ, 0, sizeof(char) * segment_width * segment_length, 0, NULL, NULL, &err);
// INITIALISE
printMatrix(segment_data, segment_length, segment_width);
// ALL GOOD
err = clEnqueueUnmapMemObject(commands,
d_b,
segment_data, 0, NULL, NULL);
checkErr(err,"Unmap Buff");
segment_data = (char*) clEnqueueMapBuffer(commands, d_b, CL_TRUE,CL_MAP_READ, 0, sizeof(char) * segment_width * segment_length, 0\
, NULL, NULL, &err);
printMatrix(segment_data, segment_length, segment_width);
// ALL ZEROs again
第一个 printMatrix() 返回正确的输出,一旦我取消映射并重新映射它,segment_data 就变成全 0(它是初始值)。我怀疑我在某处使用了不正确的标志?我不知道在哪里。
【问题讨论】: