R改进耗时的重复功能答案

【问题标题】：R improvement of time consuming repeat functionR改进耗时的重复功能
【发布时间】：2017-06-15 06:52:09
【问题描述】：

我需要生成随机样本（仅对列中的值进行洗牌），检查它是否符合条件并存储“好”的样本。我需要 1000 个随机样本。在其他帖子的帮助下，我编写了这段代码，但这非常耗时。有没有更好的解决方案？

ds = matrix(sample(0:1000, 120), ncol=20)

rep <- function(ds) {
    success <- FALSE
  while (!success) {
    x <- apply(ds,2,sample, replace=TRUE)
    success <- all(as.logical(colSums(x) <=  colSums(ds)))
  }
  #compute something based on random matrix that meets condition and return 
  #value
  }
  y=mean(x)
  return(y)
}
replicate(1000, {rep(ds)})

谢谢！

【问题讨论】：

你不能并行运行 - 列吗？
我也认为逐列抽样是个好主意。在当前规范中，success 的机会大约是 1 in (2^20)
我在想那个方向，但我不知道从哪里开始 - 如果有办法将条件放入示例函数中，或者逐列采样并检查条件，或者.. .
你能写一个如何编码的例子吗？抱歉，我是新手。

标签： r performance random repeat

【解决方案1】：

这是我在评论suc_samp 中写的想法，返回一个向量的成功采样，my_rep 将此成功采样应用于每一列（rep 是一个基本 R 函数，因此您可能希望避免屏蔽它）。

suc_samp <- function(x) {
  while(1) {
    x_samp <- sample(x, size = length(x), TRUE)
    if(sum(x_samp) <= sum(x)) break
  }
  return(x_samp)
}

my_rep <- function(ds) {
  x <- apply(ds, 2, suc_samp)
  y <- mean(x)
  return(y)
}

ds <- matrix(sample(0:1000, 120), ncol=20)

replicate(1000, {my_rep(ds)})

【讨论】：