PyCuda 程序继续运行答案

【问题标题】：PyCuda program keeps on runningPyCuda 程序继续运行
【发布时间】：2018-06-27 23:54:54
【问题描述】：

  answer_array = np.zeros_like(self.redarray)
        answer_array_gpu = cuda.mem_alloc(answer_array.nbytes)
        redarray_gpu = cuda.mem_alloc(self.redcont.nbytes)
        greenarray_gpu = cuda.mem_alloc(self.greencont.nbytes)
        bluearray_gpu = cuda.mem_alloc(self.bluecont.nbytes)
        cuda.memcpy_htod(redarray_gpu, self.redcont)
        cuda.memcpy_htod(greenarray_gpu, self.greencont)
        cuda.memcpy_htod(bluearray_gpu, self.bluecont)
        cuda.memcpy_htod(answer_array_gpu, answer_array)

        desaturate_mod = SourceModule("""
            __global__ void array_desaturation(float *a, float *b, float *c, float *d){
                int index = blockIdx.x * blockDim.x + threadIdx.x;
                d[index] = ((a[index] + b[index] + c[index])/3);
            }
        """)

        func = desaturate_mod.get_function("array_desaturation")
        func(redarray_gpu, greenarray_gpu, bluearray_gpu, answer_array_gpu,
             block=(self.gpu_threads, self.gpu_threads, self.blocks_to_use))
        desaturated = np.empty_like(self.redarray)
        cuda.memcpy_dtoh(desaturated, answer_array_gpu)
        print(desaturated)
        print("Up to here")

我编写了这段代码来查找三个数组的平均值并将其保存到第四个数组中。该代码既没有打印结果，也没有显示“到这里”的行。可能是什么错误？

附加信息：Redarray、greenarray 和 bluearray 是 float32 numpy 数组

【问题讨论】：

标签： python-3.x matrix pycharm anaconda pycuda

【解决方案1】：

我知道在 C 中开始使用数组，尤其是在 PyCUDA 中可能非常棘手，我花了几个月的时间才让 2D 滑动最大值算法工作。

在这个例子中，你不能像在 Python 中那样访问数组元素，你可以只提供一个索引，因为你将一个指向内存地址的指针传递给每个数组中的第一个元素。可以在here 中找到有关其在 C 中如何工作的有用示例。您还必须传入数组的长度（假设它们都相等，这样我们就不会超出界限），如果它们的长度不同，则分别为它们。

希望您能从该链接了解如何通过 C 中的指针访问数组元素。然后@talonmies 提供了一个很好的示例here，说明如何传入二维数组（这与一维数组相同，因为二维数组在 GPU 的内存中被展平为一维数组）。然而，当我处理这个问题时，我从来没有像@talonmies 那样取得进步，就像TutorialsPoint 教程所说的*(pointer_to_array + index) 是正确的一样。在此处提供内存跨度会导致您超出范围。

因此我的代码看起来更像：

C_Code = """
            __global__ void array_desaturation(float *array_A, float *array_B, float *array_C, float *outputArray, int arrayLengths){
                int index = blockIdx.x * blockDim.x + threadIdx.x;
                if(index >= arrayLengths){ // In case our threads created would be outwise out of the bounds for our memory, if we did we would have some serious unpredictable problems
                    return;
                }

                // These variables will get the correct values from the arrays at the appropriate index relative to their unique memory addresses (You could leave this part out but I like the legibility)
                float aValue = *(array_A + index);
                float bValue = *(array_B + index);
                float cValue = *(array_C + index);

                *(outputArray + index) = ((aValue + bValue + cValue)/3); //Set the (output arrays's pointer + offset)'s value to our average value
                }"""


desaturate_mod = SourceModule(C_Code)
desaturate_kernel = desaturate_mod.get_function("array_desaturation")

desaturate_kernel(cuda.In(array_A),                    # Input
                  cuda.In(array_B),                    # Input
                  cuda.In(array_C),                    # Input
                  cuda.Out(outputArray),               # Output
                  numpy.int32(len(array_A)),           # Array Size if they are all the same length
                  block=(blockSize[0],blockSize[1],1), # However you want for the next to parameters but change your index accordingly
                  grid=(gridSize[0],gridSize[1],1)
                  )

print(outputArray) # Done! Make sure you have defined all these arrays before ofc

【讨论】：