是的,您可以在 GPU 上启动具有不同网格大小/配置的内核。无论您启动的是同一个内核,这都是正确的:
dim3 dimBlock(block_x, block_y);
dim3 dimGrid1(N*M/dimBlock.x, N*M/dimBlock.y);
mykernel<<<dimGrid1, dimBlock>>>(...);
dim3 dimGrid2(N/dimBlock.x, N/dimBlock.y);
mykernel<<<dimGrid2, dimBlock>>>(...);
或不同的内核:
dim3 dimBlock(block_x, block_y);
dim3 dimGrid1(N*M/dimBlock.x, N*M/dimBlock.y);
mykernel1<<<dimGrid1, dimBlock>>>(...);
dim3 dimGrid2(N/dimBlock.x, N/dimBlock.y);
mykernel2<<<dimGrid2, dimBlock>>>(...);
还有一个叫做CUDA dynamic parallelism的东西可以让你做以下事情,但它需要一个计算能力为3.5的GPU:
__global__ void kernel2(...){
...
}
__global__ void kernel1(...){
...
dim3 dimBlock(block_x, block_y);
dim3 dimGrid(N/dimBlock.x, N/dimBlock.y);
kernel2<<<dimGrid, dimBlock>>>(...);
...
}
通过动态并行,甚至可以递归调用同一个内核:
__global__ void kernel1(...){
...
dim3 dimBlock(block_x, block_y);
dim3 dimGrid(N/dimBlock.x, N/dimBlock.y);
kernel1<<<dimGrid, dimBlock>>>(...);
...
}