在数据框中采样随机行，其中样本数超过行数。分配抽样概率答案

【问题标题】：Sample random rows in dataframe, where number of samples exceeds number of rows. Assign sampling probability在数据框中采样随机行，其中样本数超过行数。分配抽样概率
【发布时间】：2017-05-19 02:14:21
【问题描述】：

考虑以下示例数据，存储在名为 df 的数据框中

如您所见，此数据框有 3 行。我想做的是取 100 行样本，其中每一行都有相同的被选择概率（在本例中为 1/3）。我的输出，我们称之为 df_result 看起来像这样：

df_result
x  y
0  8
2  4
0  8
1  5
1  5
2  4

等.....直到采集 100 个样本。

我 saw this previous stackoverflow post 详细介绍了如何为数据帧抽取随机样本：df[sample(nrow(df), 3), ]

但是，当我尝试对 100 行进行采样时，这（可以预见）不起作用，并且不允许分配采样概率。

有什么建议吗？

谢谢`

【问题讨论】：

df[sample(nrow(df),100,replace=TRUE),]
@HubertL 谢谢。当我尝试在示例函数中设置 prob=c(rep(1/3,3)) 参数时，出现错误：“概率数不正确”。样本函数会自动分配相等的权重吗？
我不知道为什么......它适用于df[sample(3,100,replace=TRUE,prob=c(rep(1/3,3))),]
modelr::resample（例如modelr::resample(df, sample(nrow(df), 100, replace = TRUE))）在规模上对此很有用，因为它只存储指针和索引而不是冗余数据。要将其扩展为 data.frame，请将其传递给 as.data.frame，尽管模型可以直接处理它。

标签： r random

【解决方案1】：

df <- read.table(header = TRUE,
                text = "x  y
2  4
1  5
0  8")

set.seed(1)
df[sample(nrow(df), 10, replace=T), ]

    x y
1   2 4
2   1 5
2.1 1 5
3   0 8
1.1 2 4
3.1 0 8
3.2 0 8
2.2 1 5
2.3 1 5
1.2 2 4

【讨论】：