将原始内容的70％和30％的两个数据框中的一个数据框分开[重复]答案

【问题标题】：Separate one data frame in two data frames with 70% and 30% of the original content [duplicate]将原始内容的70％和30％的两个数据框中的一个数据框分开[重复]
【发布时间】：2017-04-12 10:36:54
【问题描述】：

我想使用 R 将一个数据帧一分为二。例如，一个数据帧占原始内容的 70%，另一个数据帧占原始内容的 30%。我怎么能那样做？我的数据框大小为（22740,2）。

我的数据框包含在一个包含基因的列中，在另一列中包含它所属的路径。我想在数据框的每个路径中保持 70-30 的关系。因此，我对获取前 70% 的行并做一个新的数据框并不感兴趣。

希望我解释清楚。

【问题讨论】：

标签： r

【解决方案1】：

使用dplyr，df2 是 70%，df3 是 30% - 创建 ref 以索引条目。 group_by 确保每个路径都是单独采样的。

library(dplyr)
df2 <- df %>% mutate(ref=seq_len(nrow(df))) %>% group_by(pathway) %>% sample_frac(0.7)
df3 <- df[-df2$ref,]

【讨论】：

【解决方案2】：

如果您想随机选择 30% 的样本，您可以这样做：

   # Select a 30% of the samples
     Sel.ID <- sample(1:22740,size = .3*22740,replace=F)
   # The new table with the 30% of the samples would be . . .
     New.Tab.30 <- Tab[Sel.ID,]
   # The table with the 70% of the samples (the remaining) would be . . .
     New.Tab.70 <- Tab[-Sel.ID,]

您可以运行不同的时间，获得不同的表格。如果你想保持不变，你应该在第一行之前使用set.seed(12345)。

【讨论】：

我认为主题启动器需要replace = FALSE
@GregoryDemin 你是对的；如果没有，则可以获得重复值。已在消息中编辑