随机打乱numpy数组每行中的项目答案

【问题标题】：Randomly shuffle items in each row of numpy array随机打乱numpy数组每行中的项目
【发布时间】：2018-11-06 07:59:55
【问题描述】：

我有一个如下的 numpy 数组：

Xtrain = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [1, 7, 3]])

我想单独打乱每行的项目，但不希望每行的打乱都相同（在几个示例中只是打乱列顺序）。

例如，我想要如下输出：

output = np.array([[3, 2, 1],
                   [4, 6, 5],
                   [7, 3, 1]])

如何以有效的方式随机随机打乱每一行？我的实际 np 数组超过 100000 行和 1000 列。

【问题讨论】：

@Kasramvd 根据 np 文档，多维数组仅沿第一轴打乱：>>> >>> arr = np.arange(9).reshape((3, 3)) >>> np.random.shuffle(arr) >>> arr array([[3, 4, 5], [6, 7, 8], [0, 1, 2]])
是的 suffle() 不接受轴参数。这是一个类似的问题stackoverflow.com/questions/50415972/…
@Kasramvd 如果我理解的话，这个问题想要改变行顺序，而不是行中的实际值。

标签： python arrays numpy

【解决方案1】：

由于您只想打乱列，因此您只需对矩阵转置执行shuffling：

In [86]: np.random.shuffle(Xtrain.T)

In [87]: Xtrain
Out[87]: 
array([[2, 3, 1],
       [5, 6, 4],
       [7, 3, 1]])

请注意，二维数组上的 random.suffle() 会随机播放行而不是每行中的项目。即改变行的位置。因此，如果您更改转置矩阵行的位置，您实际上是在打乱原始数组的列。

如果您仍然想要完全独立的随机播放，您可以为每一行创建随机索引，然后使用简单的索引创建最终数组：

In [172]: def crazyshuffle(arr):
     ...:     x, y = arr.shape
     ...:     rows = np.indices((x,y))[0]
     ...:     cols = [np.random.permutation(y) for _ in range(x)]
     ...:     return arr[rows, cols]
     ...:

演示：

In [173]: crazyshuffle(Xtrain)
Out[173]: 
array([[1, 3, 2],
       [6, 5, 4],
       [7, 3, 1]])

In [174]: crazyshuffle(Xtrain)
Out[174]: 
array([[2, 3, 1],
       [4, 6, 5],
       [1, 3, 7]])

【讨论】：

【解决方案2】：

发件人：https://github.com/numpy/numpy/issues/5173

def disarrange(a, axis=-1):
    """
    Shuffle `a` in-place along the given axis.

    Apply numpy.random.shuffle to the given axis of `a`.
    Each one-dimensional slice is shuffled independently.
    """
    b = a.swapaxes(axis, -1)
    # Shuffle `b` in-place along the last axis.  `b` is a view of `a`,
    # so `a` is shuffled in place, too.
    shp = b.shape[:-1]
    for ndx in np.ndindex(shp):
        np.random.shuffle(b[ndx])
    return

【讨论】：

【解决方案3】：

这个解决方案无论如何都不是有效的，但我觉得它很有趣，所以把它写下来。基本上，你解开数组，并创建一个行标签数组和一个索引数组。您打乱索引数组，并用它索引原始和行标签数组。然后将 stable argsort 应用于行标签以将数据收集到行中。应用该索引并重塑和中提琴，数据按行独立打乱：

import numpy as np

r, c = 3, 4  # x.shape

x = np.arange(12) + 1  # Already raveled 
inds = np.arange(x.size)
rows = np.repeat(np.arange(r).reshape(-1, 1), c, axis=1).ravel()

np.random.shuffle(inds)
x = x[inds]
rows = rows[inds]

inds = np.argsort(rows, kind='mergesort')
x = x[inds].reshape(r, c)

这是IDEOne Link

【讨论】：

【解决方案4】：

我们可以创建一个随机的二维矩阵，按每一行排序，然后使用argsort给出的索引矩阵对目标矩阵进行重新排序。

target = np.random.randint(10, size=(5, 5))
# [[7 4 0 2 5]
# [5 6 4 8 7]
# [6 4 7 9 5]
# [8 6 6 2 8]
# [8 1 6 7 3]]

shuffle_helper = np.argsort(np.random.rand(5,5), axis=1)
# [[0 4 3 2 1]
# [4 2 1 3 0]
# [1 2 3 4 0]
# [1 2 4 3 0]
# [1 2 3 0 4]]

target[np.arange(shuffle_helper.shape[0])[:, None], shuffle_helper]
# array([[7, 5, 2, 0, 4],
#       [7, 4, 6, 8, 5],
#       [4, 7, 9, 5, 6],
#       [6, 6, 8, 2, 8],
#       [1, 6, 7, 8, 3]])

说明

我们使用np.random.rand 和argsort 来模拟洗牌的效果。
random.rand 提供随机性。
然后，我们使用argsort 和axis=1 来帮助对每一行进行排名。这将创建可用于重新排序的索引。

【讨论】：

这会比直接对原始行进行排序更好吗？
@MadPhysicist 直接对原件进行排序会导致相同的结果并且没有随机性。
我刚刚得到它。您正在使用 argsort 具有轴参数的事实来弥补 shuffle 的不足。聪明
@MadPhysicist 完全正确！感谢您花时间理解这个想法。

【解决方案5】：

假设你有一个数组a，形状为 100000 x 1000。

b = np.random.choice(100000 * 1000, (100000, 1000), replace=False)
ind = np.argsort(b, axis=1)
a_shuffled = a[np.arange(100000)[:,np.newaxis], ind]

我不知道这是否比循环快，因为它需要排序，但使用此解决方案也许你会发明更好的东西，例如使用 np.argpartition 而不是 np.argsort

【讨论】：

【解决方案6】：

您可以使用Pandas:

df = pd.DataFrame(X_train)
_ = df.apply(lambda x: np.random.shuffle(x.values), axis=1, raw=False)
df.values

如果要随机排列列，请将关键字更改为 axis=0。

【讨论】：