改组数据时避免缓冲答案

【问题标题】：Avoid buffering when shuffling data改组数据时避免缓冲
【发布时间】：2017-08-24 17:38:10
【问题描述】：

我一直在努力为这个问题找到一个好名字，以及一个好的答案（可能已经存在于某个地方：/），所以我不介意任何重命名的想法。

我正在使用 numpy 数组，其中一行表示对象上的数据，通常类似于 features = [feature0, feature1]。

在使用这个数组时，我会先对其进行洗牌，然后再将其用于学习。在使用它时（在洗牌之后），我越来越需要在当前行中使用前 i 行的功能。

为此，我使用了一个缓冲区，结果我使用了一个新数组，其中行 N 为 [featuresN-i, ..., featuresN-1, featuresN] 之类的行，然后对其进行洗牌。

我想知道是否有办法改组索引并从我的 2d 数组上的 something_function 获取类似的 3d 数组：

original_array.something_function(shuffled_index[N:M]) 
-> [
    [[features of shuffled_index[ N ] - i],
                   ...                    ,
     [features of shuffled_index[ N ]    ]], 
    [[features of shuffled_index[N+1] - i],
                   ...                    ,
     [features of shuffled_index[N+1]    ]],
                  .....                    ,
    [[features of shuffled_index[ M ] - i],
                   ...                    ,
     [features of shuffled_index[ M ]    ]]
   ]

如果有，是否值得调用它来将我的缓冲数组的大小减少 i 倍？

欢迎任何提示。

【问题讨论】：

标签： python numpy buffer shuffle

【解决方案1】：

正如您自己意识到的那样：不要打乱数组。随机播放索引。

import numpy as np

# create data
nrows = 100
ncols = 4
arr = np.random.rand(nrows, ncols)

# create indices and shuffle
indices = np.arange(nrows)
np.random.shuffle(indices) # in-place operation!

# loop over shuffled indices, do stuff with array
for ii in indices:
    print ii, arr[[ii-1, ii, (ii+1) % nrows]] # (ii+1) % nrows to handle edge case (through wrap around)

【讨论】：