【发布时间】:2015-03-31 07:36:34
【问题描述】:
所以我正在编写一个在 CPU + GPU 上运行的 openCL 程序,目前我正在尝试在使用 clCreateProgramWithSource() 创建我的程序后保存/缓存二进制文件。我使用 CL_DEVICE_TYPE_ALL 创建我的 clContext 和 clProgram,并使用这些规范构建源代码。
然后我将二进制文件存储到磁盘(每个设备一个二进制文件),以便在随后启动时我的程序自动调用 clBuildProgramWithBinary。
问题是,如果我将二进制文件保存到使用设置 CL_DEVICE_TYPE_ALL 创建的磁盘,CPU 的二进制文件会损坏并且 clBuildProgramWithBinary 会引发错误。
为了将所有二进制文件正确保存到磁盘,我必须编辑我的代码以首先使用 CL_DEVICE_TYPE_CPU 运行并自行保存 CPU 二进制文件,然后再次编辑我的代码以使用 CL_DEVICE_TYPE_GPU 运行,保存gpu 二进制文件,然后最后将其切换回 CL_DEVICE_TYPE_ALL。如果我这样做,clBuildProgramWithBinary 能够准确地为每种设备类型构建二进制文件并执行我的程序。
所以这只是 openCL 的一个怪癖,我不能一起为 GPU 和 CPU 构建二进制文件吗?还是我只是做错了?
我的代码基于此处找到的二进制保存实现:https://code.google.com/p/opencl-book-samples/source/browse/trunk/src/Chapter_6/HelloBinaryWorld/HelloBinaryWorld.cpp?r=42,并进行了修改以处理多个设备。
下面是我的部分代码:
/*----Initial setup of platform, context and devices---*/
cl_int err, deviceCount;
cl_device_id *devices;
cl_platform_id platform;
cl_context context;
cl_program program;
err = clGetPlatformIDs(1, &platform, NULL);
err = clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, 0, NULL, &deviceCount);
devices = new cl_device_id[deviceCount];
err = clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, deviceCount, devices, NULL);
context = clCreateContext(NULL, deviceCount, devices, NULL, NULL, &err);
/*---Build Program---*/
int numFiles = 2;
const char *sourceFiles[] =
{
"File1.cl",
"File2.cl",
};
char *sourceStrings[numFiles];
for(int i = 0; i < numFiles; i++)
{
sourceStrings[i] = ReadFile(sourceFiles[i]);
}
/*---Create the compute program from the source buffer---*/
program = clCreateProgramWithSource(context, numFiles, (const char **)sourceStrings, NULL, &err);
/*---Build the program executable---*/
err = clBuildProgram(program, deviceCount, devices, NULL, NULL, NULL);
/*----Save binary to disk---*/
//Determine the size of each program binary
size_t *programBinarySizes = new size_t[deviceCount];
err = clGetProgramInfo(program, CL_PROGRAM_BINARY_SIZES, sizeof(size_t) * deviceCount, programBinarySizes, NULL);
if(err != CL_SUCCESS)
{
delete [] devices;
delete [] programBinarySizes;
return false;
}
unsigned char **programBinaries = new unsigned char*[deviceCount];
for(cl_uint i = 0; i < deviceCount; i++)
{
programBinaries[i] = new unsigned char[programBinarySizes[i]];
}
//Get all of the program binaries
err = clGetProgramInfo(program, CL_PROGRAM_BINARIES, sizeof(unsigned char *) * deviceCount, programBinaries, NULL);
if (err != CL_SUCCESS)
{
delete [] devices;
delete [] programBinarySizes;
for (cl_uint i = 0; i < deviceCount; i++)
{
delete [] programBinaries[i];
}
delete [] programBinaries;
}
//Store the binaries
for(cl_uint i = 0; i < deviceCount; i++)
{
// Store the binary for all devices
std::string currFile = binaryFile + to_string(i) + ".txt";
FILE *fp = fopen(currFile.c_str(), "wb");
fwrite(programBinaries[i], 1, programBinarySizes[i], fp);
fclose(fp);
}
// Cleanup
delete [] programBinarySizes;
for (cl_uint i = 0; i < deviceCount; i++)
{
delete [] programBinaries[i];
}
delete [] programBinaries;
然后在接下来绕过我的代码,调用此函数从二进制文件创建程序:
unsigned char **programBinaries = new unsigned char *[deviceCount];
size_t sizes[deviceCount];
for(int i = 0; i < deviceCount; i++)
{
string currFile = binaryFile + to_string(i) + ".txt";
FILE *fp = fopen(currFile.c_str(), "rb");
if(!fp) return NULL;
size_t binarySize;
fseek(fp, 0, SEEK_END);
binarySize = ftell(fp);
sizes[i] = binarySize;
rewind(fp);
programBinaries[i] = new unsigned char[binarySize];
fread(programBinaries[i], 1, binarySize, fp);
fclose(fp);
}
cl_int errNum = 0;
cl_program program;
cl_int binaryStatus;
program = clCreateProgramWithBinary(context,
deviceCount,
devices,
sizes,
(const unsigned char **)programBinaries,
&binaryStatus,
&errNum);
delete [] programBinaries;
errNum = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);
【问题讨论】:
-
您是否检查过(或者您能否提供)CPU 和 GPU 的示例二进制文件(例如,仅用于简单的“矢量添加”内核),一次在工作版本中,一次在不工作版本中版本?它们有显着差异吗?还是仅在一个字节左右?它们的文件大小相同吗? (无论如何,我可能会在今天晚些时候尝试一下,看看我是否可以重现错误)
-
在某些平台(至少对于 ARM)中,您需要在二进制文件中创建所有内核,以便实际编译。否则它将保持 LLVM 格式。你能发布你的系统设置吗?
标签: opencl gpu cpu binaryfiles