在间隔重复元素时随机播放数组答案

【问题标题】：Shuffle array while spacing repeating elements在间隔重复元素时随机播放数组
【发布时间】：2019-04-17 02:25:49
【问题描述】：

我正在尝试编写一个对包含重复元素的数组进行洗牌的函数，但要确保重复元素彼此之间不会太接近。

此代码有效，但对我来说似乎效率低下：

function shuffledArr = distShuffle(myArr, myDist)
% this function takes an array myArr and shuffles it, while ensuring that repeating 
% elements are at least myDist elements away from on another    

% flag to indicate whether there are repetitions within myDist
reps = 1;
while reps 

    % set to 0 to break while-loop, will be set to 1 if it doesn't meet condition
    reps = 0;  

    % randomly shuffle array
    shuffledArr = Shuffle(myArr);

    % loop through each unique value, find its position, and calculate the distance to the next occurence
    for x = 1:length(unique(myArr))
        % check if there are any repetitions that are separated by myDist or less
       if any(diff(find(shuffledArr == x)) <= myDist)
           reps = 1;
       break;
   end
end
end

这对我来说似乎不是最理想的，原因有以下三个：

1) 在找到解决方案之前，可能不需要反复洗牌。

2) 如果没有可能的解决方案（即将 myDist 设置得太高而无法找到适合的配置），这个 while 循环将永远持续下去。关于如何提前抓住这个的任何想法？

3) 必须有一种比循环遍历每个唯一值更简单的方法来确定数组中重复元素之间的距离。

我将不胜感激第 2 点和第 3 点的答案，即使第 1 点是正确的并且可以一次性完成。

【问题讨论】：

例如，如果给定数组有 8 种可能的改组，您的算法是否必须只找到一种解决方案，或者以 1/8 的概率随机给您一个改组？第二种情况更难达到。
你知道忌排序吗？它试图通过随机化来对数组进行排序，然后检查它是否已排序，如果没有，则再次对其进行随机化。这显然不是任何人都应该选择的排序算法，但你所做的听起来非常相似（有点弱，承认）。随机化是否必要？或者会，例如最大化sum(abs(diff(shuffledArr))) 就足够了吗？
@obchardon 任何解决方案都可以。
@NickyMattsson：从来没有听说过bogey sort，谢谢你的提示！在这种情况下，随机化是必要的，因为这是为心理实验创建随机顺序的刺激。不幸的是，最大化距离不能满足我使刺激呈现自然（即伪随机）的目标。
如果数组不是太大，您可以生成所有排列，保留满足最小距离的排列，然后选择一个。这是一个选择吗？

标签： arrays matlab random shuffle

【解决方案1】：

我认为检查以下条件就足以防止无限循环：

[~,num, C] = mode(myArr);
N = numel(C);
assert( (myDist<=N)  || (myDist-N+1) * (num-1) +N*num <= numel(myArr),...
'Shuffling impossible!');

假设myDist 是2，我们有以下数据：

[4 6 5 1 6 7 4 6]

我们可以找到模式6及其出现3。我们安排6s 用2 = myDist 空格分隔它们：

6 _ _ 6 _ _6

必须有(3-1) * myDist = 4 数字才能填空。现在我们还有五个数字，因此可以对数组进行洗牌。

如果我们有多种模式，问题会变得更加复杂。例如对于这个数组[4 6 5 1 6 7 4 6 4]，我们有N=2 模式：6 和4。它们可以排列为：

6 4 _ 6 4 _ 6 4

我们有 2 个空格和另外三个数字 [ 5 1 7] 可用于填充空格。例如，如果我们只有一个数字[ 5]，则无法填补空白，我们也无法对数组进行洗牌。

对于第三点，您可以使用稀疏矩阵来加速计算（我在 Octave 中的初始测试表明它更有效）：

function shuffledArr = distShuffleSparse(myArr, myDist)
    [U,~,idx] = unique(myArr);
    reps = true;
    while reps 
        S = Shuffle(idx);
        shuffledBin = sparse ( 1:numel(idx), S, true, numel(idx) + myDist, numel(U) );
        reps = any (diff(find(shuffledBin)) <= myDist);
    end
    shuffledArr = U(S);
end

您也可以使用sub2ind 和sort 代替稀疏矩阵：

function shuffledArr = distShuffleSparse(myArr, myDist)
    [U,~,idx] = unique(myArr);
    reps = true;
    while reps 
        S = Shuffle(idx);
        f = sub2ind ( [numel(idx) + myDist, numel(U)] , 1:numel(idx), S );
        reps = any (diff(sort(f)) <= myDist);
    end
    shuffledArr = U(S);
end

【讨论】：

非常感谢！我认为 obchardon 的方法在这里更适合，但这让我意识到了稀疏矩阵的优点！
我已经用不使用稀疏矩阵的解决方案更新了我的答案。

【解决方案2】：

如果您只想找到一种可能的解决方案，您可以使用类似的方法：

x = [1   1   1   2   2   2   3   3   3   3   3   4   5   5   6   7   8   9];
n = numel(x);
dist = 3;           %minimal distance
uni = unique(x);    %get the unique value
his = histc(x,uni); %count the occurence of each element
s = [sortrows([uni;his].',2,'descend'), zeros(length(uni),1)];

xr = [];            %the vector that will contains the solution

%the for loop that will maximize the distance of each element
for ii = 1:n
    s(s(:,3)<0,3) = s(s(:,3)<0,3)+1;
    s(1,3) = s(1,3)-dist;
    s(1,2) = s(1,2)-1; 
    xr = [xr s(1,1)];
    s = sortrows(s,[3,2],{'descend','descend'})
end

if any(s(:,2)~=0)
    fprintf('failed, dist is too big')
end

结果：

xr = [3   1   2   5   3   1   2   4   3   6   7   8   3   9   5   1   2   3]

说明：

我创建了一个向量 s 并在开头 s 等于：

s =

   3   5   0
   1   3   0
   2   3   0
   5   2   0
   4   1   0
   6   1   0
   7   1   0
   8   1   0
   9   1   0

%col1 = unique element; col2 = occurence of each element, col3 = penalities

在我们的 for 循环的每次迭代中，我们都会选择出现次数最多的元素，因为这个元素将更难放入我们的数组中。

那么第一次迭代后s等于：

s =

   1   3   0  %1 is the next element that will be placed in our array.
   2   3   0
   5   2   0
   4   1   0
   6   1   0
   7   1   0
   8   1   0
   9   1   0
   3   4  -3  %3 has now 5-1 = 4 occurence and a penalities of -3 so it won't show up the next 3 iterations.

最后第二列的每个数字都应该等于0，如果不是最小距离太大的话。

【讨论】：

优秀的方法。由于我的实验需要一个伪随机顺序，因此我通过在每次迭代中改组所有可接受的答案（即惩罚 = 0 和最大出现次数的 s 的所有行）稍微调整了您的代码。