我有一个类似的用例,我需要在一批多通道图像上创建滑动窗口,并最终提出了以下功能。 I've written a more in-depth blog post covering this in regards to manually creating a Convolution layer。该函数实现了滑动窗口,还包括对输入数组进行扩展或添加填充。
函数作为输入:
input - Size of (Batch, Channel, Height, Width)
output_size - Depends on usage, comments below.
kernel_size - size of the sliding window you wish to create (square)
padding - amount of 0-padding added to the outside of the (H,W) dimensions
stride - stride the sliding window should take over the inputs
dilate - amount to spread the cells of the input. This adds 0-filled rows/cols between elements
通常,在执行前向卷积时,您不需要执行膨胀,因此可以使用以下公式找到输出大小(将 x 替换为输入维度):
(x - kernel_size + 2 * padding) // stride + 1
使用此函数执行卷积的反向传递时,使用步幅 1 并将 output_size 设置为正向传递的 x-input 的大小
可以在at this link找到示例代码以及使用此功能的示例。
def getWindows(input, output_size, kernel_size, padding=0, stride=1, dilate=0):
working_input = input
working_pad = padding
# dilate the input if necessary
if dilate != 0:
working_input = np.insert(working_input, range(1, input.shape[2]), 0, axis=2)
working_input = np.insert(working_input, range(1, input.shape[3]), 0, axis=3)
# pad the input if necessary
if working_pad != 0:
working_input = np.pad(working_input, pad_width=((0,), (0,), (working_pad,), (working_pad,)), mode='constant', constant_values=(0.,))
in_b, in_c, out_h, out_w = output_size
out_b, out_c, _, _ = input.shape
batch_str, channel_str, kern_h_str, kern_w_str = working_input.strides
return np.lib.stride_tricks.as_strided(
working_input,
(out_b, out_c, out_h, out_w, kernel_size, kernel_size),
(batch_str, channel_str, stride * kern_h_str, stride * kern_w_str, kern_h_str, kern_w_str)
)