从向量中选择随机且唯一的元素答案

【问题标题】：Select random and unique elements from a vector从向量中选择随机且唯一的元素
【发布时间】：2018-12-24 14:09:31
【问题描述】：

假设我有一个带有重复元素的简单向量：

a <- c(1,1,1,2,2,3,3,3)

有没有办法从每个重复的元素中随机选择一个独特的元素？ IE。一个随机抽签指出要保留的元素是：

1,4,6 ## here I selected the first 1, the first 2 and the first 3

还有一个：

1,5,8 ## here I selected the first 1, the second 2  and the third 3

我可以为每个重复的元素循环执行此操作，但我确信必须有更快的方法来执行此操作？

编辑：

理想情况下，如果某个元素已经是唯一元素，则解决方案也应该始终选择该元素。 IE。我的向量也可以是：

b <- c(1,1,1,2,2,3,3,3,4) ## The number four is unique and should always be drawn

【问题讨论】：

标签： r vector random unique

【解决方案1】：

使用基础 R ave 我们可以做类似的事情

unique(ave(seq_along(a), a, FUN = function(x) if(length(x) > 1) head(sample(x), 1) else x))
#[1] 3 5 6

unique(ave(seq_along(a), a, FUN = function(x) if(length(x) > 1) head(sample(x), 1) else x))
#[1] 3 4 7

这会为a 的每个值生成一个索引，按a 分组，然后在每个组中选择一个随机索引值。

使用与sapply 和split 相同的逻辑

sapply(split(seq_along(a), a), function(x) if(length(x) > 1) head(sample(x), 1) else x)

它也适用于tapply

tapply(seq_along(a), a, function(x) if(length(x) > 1) head(sample(x), 1) else x)

我们之所以需要检查length（if(length(x) > 1)）是因为来自?sample

如果 x 的长度为 1，是数字（在 is.numeric 的意义上）并且 x >= 1，则通过 sample 进行采样从 1:x 开始。

因此，当sample() 中只有一个数字（n）时，它从1:n（而不是n）中获取sample，因此我们需要检查它的长度。

【讨论】：

谢谢，它适用于我的玩具示例，但我只是在我的数据中注意到，有时并非所有元素都重复，但我仍然想保留那个单个元素
我使用了这个向量：a <- c(1,1,1,2,3,3,3) 并得到了输出：[1] 2 7
@RonakShah sapply 和 tapply 方法不起作用。这是因为，当x 是单个数值向量时，sample(x) 表示sample(1:x)，这会导致意外结果。
@www 完全正确。我只需要通过文档来记住这一点。谢谢:)