有约束的样本/有约束的随机化答案

【问题标题】：Sample with constrain / Randomize with constrain有约束的样本/有约束的随机化
【发布时间】：2018-03-19 17:49:27
【问题描述】：

我有以下数据框

 design <- read.table(text =
"block position
 1     1
 1     2
 1     3
 1     4
 2     1
 2     2
 2     3
 2     4", header = TRUE)

我想在一个块内随机分配四个治疗。例如，我可以使用以下代码来做到这一点：

treatment <- letters[1:4]
set.seed(2)
design$treatment <- as.vector(replicate(2,sample(treatment, length(treatment))))

产生以下数据框

> design
 block position treatment
 1        1         a
 1        2         c
 1        3         b
 1        4         d
 2        1         d
 2        2         c
 2        3         a
 2        4         b

问题：在上面的例子中，治疗 c 是在位置 2 两次。一次治疗不应该在同一位置两次。我怎样才能做到这一点？

更笼统：有没有简单的约束采样解决方案？

【问题讨论】：

除非你真的需要随机性，否则最简单的方法就是延迟每个组，例如design$treatment <- unlist(lapply(seq(max(design$block)), function(x){letters[if (x == 1) 1:max(design$position) else c(seq(x, max(design$position)), seq(x - 1))]}))

标签： r random

【解决方案1】：

以下方法应确保（1）处理的随机性，以及（2）不同块在同一位置的不同处理。

我们使用gtools::permutations 计算letters[1:4] 的所有排列。我们将排列集存储在矩阵perm中。

# Calculate all permutations of letters[1:4]
library(gtools);
treatment <- letters[1:4];
perm <- permutations(length(treatment), length(treatment), treatment);

我们创建一个空的treatment 向量，它将逐块连续填充。
```
design$treatment <- "";
```

我们现在从perm 中随机抽取第一个block 的排列。一旦我们绘制了一个排列，我们就会从perm（即我们的排列集）中删除所有在相同位置具有任何个相同条目的排列。然后，我们从第二个block 的简化排列集中随机抽取一个排列。以此类推。

set.seed(2017);
for (i in 1:length(unique(design$block))) {
    smpl <- perm[sample(nrow(perm), 1), ];
    design$treatment[seq(1 + 4 * (i - 1), 4 * i)] <- smpl;
    # Remove all permutations with duplicated letters
    j <- 1;
    while (j <= nrow(perm)) {
        if (any(perm[j, ] == smpl)) perm <- perm[-j, ] else j <- j + 1;
    }
}
design;
#    block position treatment
#1     1        1         d
#2     1        2         c
#3     1        3         a
#4     1        4         b
#5     2        1         b
#6     2        2         a
#7     2        3         d
#8     2        4         c

删除 set.seed(...) 以使用随机种子。

【讨论】：

非常好！不幸的是，我的 reprex 大大简化了，我的真实数据有 330 次处理。因此，计算所有可能的排列不是一种选择。我略微采用了您的代码，因此它也适用于我的项目（请参阅下面的答案）。唯一的区别是，我只计算了一些而不是所有可能的排列。
@retodomax 同意，随着处理的增加，排列的数量会迅速增长。我喜欢你的方法；初始排列的数量需要足够大，以确保在连续删除具有重复条目的排列后可以绘制足够的排列。我想这有点反复试验。

【解决方案2】：

此解决方案适用于大量治疗，并且基于 answer of Maurits Evers。仅计算 1000 个排列，而不是所有可能的排列。

n_treat <- 20

# make large design file
design <- data.frame(block = rep(1:4, each = n_treat), position = rep(1:n_treat, 4))

# Calculate some (not all) random permutations
treatment <- 1:n_treat
perm <- t(replicate(1000,sample(treatment, length(treatment), replace = F)))

# Create empty treatment vector
design$treatment <- ""

# loop through all blocks,
# randomly draw a permutation from perm,
# remove permutations with identiacal entries at the same position.
set.seed(2017);
for (i in 1:length(unique(design$block))) {
  smpl <- perm[sample(nrow(perm), 1), ];
  design$treatment[seq(1 + n_treat * (i - 1), n_treat * i)] <- smpl;
  # Remove all permutations with duplicated letters
  j <- 1;
  while (j <= nrow(perm)) {
    if (any(perm[j, ] == smpl)) perm <- perm[-j, ] else j <- j + 1;
  }
}

【讨论】：