使用 numpy 中的二维索引对一维数组进行子采样答案

【问题标题】：Subsample 1-D array using 2-D indices in numpy使用 numpy 中的二维索引对一维数组进行子采样
【发布时间】：2013-04-20 20:27:53
【问题描述】：

背景： 我正在使用的数据是从 netCDF4 对象中提取的，该对象在初始化时创建了一个 numpy 掩码数组，但似乎不支持 numpy reshape() 方法，因此只能在所有数据都被重新整形后复制 = 太慢了。

问题：如何对一维数组（基本上是一个扁平的二维数组）进行二次采样，而不对其进行整形？

import numpy

a1 = np.array([[1,2,3,4],
               [11,22,33,44],
               [111,222,333,444],
               [1111,2222,3333,4444],
               [11111,22222,33333,44444]])

a2 = np.ravel(a1)

rows, cols = a1.shape

row1 = 1
row2 = 3

col1 = 1
col2 = 3

我想使用一种不需要将一维数组重新整形为二维数组的快速切片方法。

所需的输出：

np.ravel(a1[row1:row2, col1:col2])

>> array([ 22,  33, 222, 333])

我得到了开始和结束位置，但这只是选择这些点之间的所有数据（即额外的列）。

idx_start = (row1 * cols) + col1
idx_end   = (row2 * cols) + col2

更新： 我刚试过Jaime's brilliant answer，但似乎netCDF4 不允许二维索引。

z = dataset.variables["z"][idx]
  File "netCDF4.pyx", line 2613, in netCDF4.Variable.__getitem__ (netCDF4.c:29583)
  File "/usr/local/lib/python2.7/dist-packages/netCDF4_utils.py", line 141, in _StartCountStride
    raise IndexError("Index cannot be multidimensional.")
IndexError: Index cannot be multidimensional.

【问题讨论】：

标签： python numpy indexing slice

【解决方案1】：

这是一个精益建议

a1 = np.array([[1,2,3,4],
               [11,22,33,44],
               [111,222,333,444],
               [1111,2222,3333,4444],
               [11111,22222,33333,44444]])

row1 = 1; row2 = 3; ix = slice(row1,row2)
col1 = 1; col2 = 3; iy = slice(col1,col2)
n = (row2-row1)*(col2-col1)

print(a1[ix,iy]);    print()
print(a1[ix,iy].reshape(1,n))
.
[[ 22  33]
 [222 333]]

[[ 22  33 222 333]]

在 Python 中重塑并不昂贵，slice is fast.

【讨论】：

【解决方案2】：

我想出了这个，虽然它不会复制所有数据，但它仍然会将我不想要的数据复制到内存中。这可能可以改进，我希望有更好的解决方案。

zi = 0 
# Create zero array with the appropriate length for the data subset
z = np.zeros((col2 - col1) * (row2 - row1))
# Process number of rows for which data is being extracted
for i in range(row2 - row1):
    # Pull row, then desired elements of that row into buffer
    tmp = ((dataset.variables["z"][(i*cols):((i*cols)+cols)])[col1:col2])
    # Add each item in buffer sequentially to data array
    for j in tmp:
        z[zi] = j 
        # Keep a count of what index position the next data point goes to
        zi += 1

【讨论】：

【解决方案3】：

np.ogrid 和np.ravel_multi_index 的组合可以得到你想要的：

>>> a1
array([    1,     2,     3,     4,    11,    22,    33,    44,   111,
         222,   333,   444,  1111,  2222,  3333,  4444, 11111, 22222,
       33333, 44444])
>>> idx = np.ravel_multi_index((np.ogrid[1:3,1:3]), (5, 4))
>>> a1[idx]
array([[ 22,  33],
       [222, 333]])

如果这就是你所追求的，你当然可以解开这个数组以获得一维回报。另请注意，这是您的原始数据的副本，而不是视图。

编辑您可以保持相同的通用方法，将 np.ogrid 替换为 np.mgrid 并对其进行整形以获得平坦的回报：

>>> idx = np.ravel_multi_index((np.mgrid[1:3,1:3].reshape(2, -1)), (5, 4))
>>> a1[idx]
array([ 22,  33, 222, 333])

【讨论】：

一维返回与从原始一维维度获取切片无关紧要。现在检查一下。看起来很完美。谢谢！
(4,4) 的np.ravel_multi_index dims 应该不是(5,4) 这里吗？
@shootingstars 是的，我的错，我数错了，已经编辑了答案。
谢谢。在尝试了这个（很棒的解决方案）之后，似乎netCDF4 不喜欢二维索引。有什么建议？我已将错误添加到我的问题中。
刚刚试了一下，得到了Killed。这可能是我做过的事情，但我必须在早上重新检查。感谢 Jaime 的帮助！