【发布时间】:2021-08-12 17:59:09
【问题描述】:
我正在为我的大学做作业,主要想法是将 CUDA 数据并行性与 CUDA 任务并行性进行比较。我想出了一个想法来并行化康威的生活游戏。问题是,我无法弄清楚如何在 CUDA 中以多个方向导航二维数组,即上/下/右/左以及内核评估的单元格周围的角。
到目前为止,我想出了以下几点:
第一个内核代码
//determines the alive cell and save value of each cell into an array
__global__ void numAliveAround(int *oldBoard, int *newBoard, int xSize, int ySize, size_t pitchOld, size_t pitchNew)
{
int x = (blockIdx.x * blockDim.x) + threadIdx.x;
int y = (blockIdx.y * blockDim.y) + threadIdx.y;
if(x < xSize && y < ySize)
{
//cell above
//xMod is to make sure the number wraps when it overflows the board
xMod = ((x + 1) % xSize + xSize) % xSize;
//idx calculation
idx = xMod * xSize + y;
outputNumber += board[idx];
//more of the same code, just for cell under, left, right, and corners
newBoard[x * xSize + y] = outputNumber;
}
}
第二个内核代码
//sets new cell status according to the number of alive cells around
__global__ void determineNextState(int *board, int *newBoard, int xSize, int ySize, size_t pitchOld, size_t pitchNew)
{
//getting threads
int x = (blockIdx.x * blockDim.x) + threadIdx.x;
int y = (blockIdx.y * blockDim.y) + threadIdx.y;
if (x < xSize && y < ySize)
{
int idxNew = x * xSize + y;
int idxOld = x * xSize + y;
int state = board[idxOld];
//ALIVE = 1, DEAD = 0;
int output = DEAD;
//checking if any alive condition is met
if (state == ALIVE)
{
if ((newBoard[idxNew] == 2 || newBoard[idxNew] == 3))
{
output = ALIVE;
}
}
else
{
if (newBoard[idxNew] == 3)
{
output = ALIVE;
}
}
newBoard[idxNew] = output;
}
}
内核调用函数
void SendToCUDA(int oldBoard[COLUMNS][ROWS], int newBoard[COLUMNS][ROWS])
{
//CUDA pointers
int *d_oldBoard;
int *d_newBoard;
size_t pitchOld;
size_t pitchNew;
cudaMallocPitch(&d_oldBoard, &pitchOld, COLUMNS * sizeof(int), ROWS);
cudaMallocPitch(&d_newBoard, &pitchNew, COLUMNS * sizeof(int), ROWS);
cudaMemcpy2D(d_oldBoard, pitchOld, oldBoard, COLUMNS * sizeof(int), COLUMNS * sizeof(int), ROWS, cudaMemcpyHostToDevice);
dim3 grid(divideAndRound(COLUMNS, BLOCKSIZE_X), divideAndRound(ROWS, BLOCKSIZE_Y));
dim3 block(BLOCKSIZE_Y, BLOCKSIZE_X);
printf("counting \n");
numberAliveAround <<<block, grid>>> (d_oldBoard, d_newBoard, COLUMNS, ROWS, pitchOld, pitchNew);
cudaDeviceSynchronize();
printf("determining \n");
determineNextState <<<block, grid>>> (d_oldBoard, d_newBoard, COLUMNS, ROWS, pitchOld, pitchNew);
cudaDeviceSynchronize();
//using newBoard later (outside the function) to display the Board
cudaMemcpy2D(newBoard, COLUMNS * sizeof(int), d_newBoard, pitchNew, COLUMNS * sizeof(int), ROWS, cudaMemcpyDeviceToHost);
cudaFree(d_oldBoard);
cudaFree(d_newBoard);
}
我发现了多种访问扁平二维数组的方法,其中一些相互矛盾,例如:
//what is usually used as an exmplanation
idx = x * widht + y;
//sometimes x and y are swapped
idx = y * width + x;
//what works with simple access
int *value = (int *)((char *)(d_matrix + y * pitch)) + x;
//or
idx = x * xDim + y + pitch;
有趣的是,当我只访问数组中的一个点(例如将其中的所有值增加 1)时,后面的 2 个可以工作,但完全不适用于更复杂的导航。在这一点上,我已经坐在这个问题上很长一段时间了。因此,任何形式的见解都会非常有帮助。
【问题讨论】: