【发布时间】:2015-03-29 07:24:26
【问题描述】:
当我使用 Nsight 时,我看到只分配了一半的共享内存数组s_f[sidx] = 5;
__global__ void BackProjectPixel(double* val,
double* projection,
double* focalPtPos,
double* pxlPos,
double* pxlGrid,
double* detPos,
double *detGridPos,
unsigned int nN,
unsigned int nS,
double perModDetAngle,
double perModSpaceAngle,
double perModAngle)
{
const double fx = focalPtPos[0];
const double fy = focalPtPos[1];
//extern __shared__ double s_f[64]; //
__shared__ double s_f[64]; //
unsigned int i = (blockIdx.x * blockDim.x) + threadIdx.x;
unsigned int j = (blockIdx.y * blockDim.y) + threadIdx.y;
unsigned int idx = j*nN + i;
unsigned int sidx = threadIdx.y * blockDim.x + threadIdx.x;
unsigned int threadsPerSharedMem = 64;
if (sidx < threadsPerSharedMem)
{
s_f[sidx] = 5;
}
__syncthreads();
//double * angle;
//
if (sidx < threadsPerSharedMem)
{
s_f[idx] = TriPointAngle(detGridPos[0], detGridPos[1],fx, fy, pxlPos[idx*2], pxlPos[idx*2+1], nN);
}
}
这是我观察到的
我想知道为什么只有 32 个 5? s_f中不应该有六十四个5吗?谢谢。
【问题讨论】:
-
什么时候你观察到的? 32是经线。您确定您不只是查看部分执行的结果吗?
-
@talonmies 在第一次命中并执行断点时观察。断点设置在 s_f[sidx] = 5;
标签: cuda shared-memory nsight