将 3d 数组传递给 C 中的卷积函数答案

【问题标题】：Passing 3d arrays to a convolution function in C将 3d 数组传递给 C 中的卷积函数
【发布时间】：2021-10-05 21:53:33
【问题描述】：

我需要做一个执行 2D 卷积的函数，为此我需要将几个 3d 数组传递给它。但是，有人告诉我，我的方法不是执行此操作的理想方法。

首先，我声明变量：

typedef struct {
    float img[224][224][3];
} input_224_t;

typedef struct {
    float img[112][112][32];
} input_112_t;

typedef struct {
    float img[3][3][32];
} weightsL1_t;

然后，卷积看起来像这样：

void convolution(input_224_t* N, weightsL1_t* M, input_112_t* P, int size, int ksize, int channels, int filters, int stride)
{
    // Effectively pads the image before convolution. Technically also works for pointwise, but it's inefficient.
    // find center position of kernel (half of kernel size)
    int kcenter = ksize / 2;

    // Declare output indexes
    int a = 0;
    int b = -1;

    for (int k = 0; k < filters; ++k)                   // filters
    {
        for (int i = 0; i < size; i = i + stride)       // rows
        {
            for (int j = 0; j < size; j = j + stride)   // columns
            {
                b++;
                if (b == ksize) {b=0;a++;}              // Increment output index
                for (int m = 0; m < ksize; ++m)         // kernel rows
                {
                    for (int n = 0; n < ksize; ++n)     // kernel columns
                    {
                        // Index of input signal, used for checking boundary
                        int ii = i + (m - kcenter);
                        int jj = j + (n - kcenter);

                        // Ignore input samples which are out of bound
                        if (ii >= 0 && ii < size && jj >= 0 && jj < size) {
                            for (int p = 0; p < channels; ++p)  // channels
                            {
                                P.img[a][b][k] += N.img[ii][jj][p] * M.img[m][n][k];    // convolve
                            }
                        }
                    }
                }
            }
        }
    }
}

（这会在“convolve”行返回“field 'img' could not be resolved”）

然后我将值导入到正确的结构中（这是我之前的一个问题，已经回答：Write values to a 3D array inside a struct in C），然后我这样调用函数：

convolution(test_image, test_filter, test_result, 6, 3, 1, 1, 2);

我在上一个问题中被告知，这不是处理 3D 数组的理想方式，而且它可能使用比我预期更多的内存。这是一个非常消耗内存的过程，并且会在嵌入式系统中运行，因此优化内存分配至关重要。

如果可能的话，我的目标是在任何时间点只分配这些 3D 数组中的一个，以便不使用不必要的内存，并以这种空间可以稍后释放。

提前谢谢你。

【问题讨论】：

内核应该是 4 维的，float[3][3][3][32]。每个输出通道都应该使用单独的内核？
不要使用 _t 作为 typedef 后缀。 gnu.org/software/libc/manual/html_node/Reserved-Names.html
@tstanisl 没错，2D 卷积有 4-dim 内核。我查看了我生成的权重并假设相反，但你是对的。
@WilliamPursell 真的吗？我被告知这是一个很好的做法，显然情况恰恰相反。谢谢你告诉我。

标签： c multidimensional-array convolution

【解决方案1】：

您可以使用可变长度数组作为函数参数。

void convolve(int isize,  // width/height of input (224)
              int osize,  // width/height of output (112)
              int ksize,  // width/height of kernel (3)
              int stride, // shift between input pixels, between consecutive outputs
              int pad,    // offset between (0,0) pixels between input and output
              int idepth, int odepth, // number of input and output channels
              float idata[isize][isize][idepth],
              float odata[osize][osize][odepth],
              float kdata[idepth][ksize][ksize][odepth])

{
  // iterate over the output
  for (int oy = 0; oy < osize; ++oy) {
  for (int ox = 0; ox < osize; ++ox) {
  for (int od = 0; od < odepth; ++od) {
      odata[oy][ox][od] = 0;
      for (int ky = 0; ky < ksize; ++ky) {
      for (int kx = 0; kx < ksize; ++kx) {
          // map position in output and kernel to the input
          int iy = stride * oy + ky - pad;
          int ix = stride * ox + kx - pad;
          // use only valid inputs
          if (iy >= 0 && iy < isize && ix >= 0 && ix < isize)
              for (int id = 0; id < idepth; ++id)
                  odata[oy][ox][od] += kdata[id][ky][kx][od] * idata[iy][ix][id];
      }}
  }}}
}

典型用法是：

// allocate input
float (*idata)[224][3] = calloc(224, sizeof *idata);
// fill input using idata[y][x][d] syntax

// allocate kernel
float (*kdata)[3][3][32] = calloc(3, sizeof *kdata);
// fill kernel

// allocate output
float (*odata)[112][32] = calloc(112, sizeof *odata);

convolve(224, 112, 3, // input, output, kernel size
         2, // stride
         1, // pad input by one pixel what will center the kernel
         3, 32, // number of input and output channels
         idata, odata, kdata);

// free memory if it is no longer used
free(idata); free(odata); free(kdata);

多维数组可以通过以下方式分配：

float (*arr)[10][20][30] = malloc(sizeof *arr);

但是由于语法(*arr)[i][j][j]，访问元素有点麻烦。因此，使用指向数组第一个元素的指针并在该指针处分配多个子数组很简单。

float (*arr)[20][30] = malloc(10 * sizeof *arr);

或使用 calloc() 自动归零并避免溢出。

float (*arr)[20][30] = calloc(10, sizeof *arr);

顺便说一句。我建议将内核的尺寸重新排序为 ODEPTH x KSIZE x KSIZE x IDEPTH。这将使对内核的迭代对缓存更加友好。

【讨论】：

这是一个非常有趣的解决方法。但我有几个问题。首先，您使用变量 'yx' 和后来的 'id' 并且这些没有定义。其次，当您分配输入、内核和输出时，它们的维度都比应有的少一维。这看起来是故意的，我猜你无论如何都可以分配所有数据。你能告诉我在这种情况下你将如何导入这些数据吗？
@Ricardo，抱歉，有一些拼写错误，请查看更新后的答案
我还认为您忘记将跨步放在对函数的调用中，但除此之外它看起来很棒。我没有考虑过迭代输出，这样效率更高。我在调用函数时也收到警告：“从不兼容的指针类型 [-Wincompatible-pointer-types] 传递‘convolve’的参数 10”以及参数 9。这是一个问题吗？
@Ricardo，是的.. args 被交换了。固定
因此，例如，如果我想将 3 的值分配给输入的 [0][0][0]，我会使用什么命令？像idata[0][0][0] = 3 这样的东西？此外，在此代码中，您将数据分配为 [y][x][d]，其中 y 是列索引，x 是行索引，对吗？