从data.frame中选择随机行并根据R中的三个条件将其分配给另外两个data.frames之一答案

【问题标题】：Selecting random row from a data.frame and assigning it to one of the two other data.frames based on three conditions in R从data.frame中选择随机行并根据R中的三个条件将其分配给另外两个data.frames之一
【发布时间】：2015-05-22 05:24:33
【问题描述】：

我有一个 data.frame (a)，如下所述：

   V1 V2
1   a  b
2   a  e
3   a  f
4   b  c
5   b  e
6   b  f
7   c  d
8   c  g
9   c  h
10  d  g
11  d  h
12  e  f
13  f  g
14  g  h

假设每一行代表图的一条边，行的值是顶点。

我想要的是从 data.frame (a) 中选择一个随机行（即边缘），并根据以下三个条件将其分配给 data.frame (b) 或 data.frame (c)。只是为了澄清 data.frame (b 和 c) 一开始是空的。所以条件是：

当从 data.frame (a) 中随机选取一行（边）时，如果两个顶点均未分配，则将边分配给行数最少的 data.frame。

为了澄清这种情况：假设我从具有两个顶点“a”和“e”的 data.frame (a) 中选择一个随机行（边）#2。所以我应该检查 data.frame (b) 和 data.frame (c) 的任何行中是否存在“a”或“e”。因此，如果他们存在“a”或“e”，则不应实施此规则，应检查下一条规则。如果两个 data.frames 在任何行中都没有“a”或“e”，则应在两个 data.frames 中检查 nrow(行数)，并且应分配 nrow() 数量较少的那个排。如果两者都具有相同的 nrow()，则可以将两个 data.frame 中的任何一个分配给该行。

当从 data.frame (a) 中随机选取行（边）时，如果该行的一个顶点存在于任何 data.frame (b) 或 (c) 中，则分配行（边) 到那个data.frame

如果选择了一个随机行，例如#3，它有“a”和“f”。然后应检查 data.frames b 和 c 以查看是否有任何行包含“a”或“f”。假设 data.frame (b) 不包含“a”或“f”，但 data.frame (c) 包含“f”。所以该行应该分配给data.frame（c）。现在也有可能 data.frame (b) 包含“a”而 data.frame(c) 包含“f”。在这种情况下，应计算 data.frame (b) 中的“a”和 data.frame (c) 中的“f”的所有实例。如果“a”出现 3 次，“f”出现 4 次，则该行应分配给 (b)，即该行应分配给该数据中存在的顶点实例数量较少的 data.frame。框架。

当从 data.frame (a) 中随机选取行（边）时，如果该行的两个顶点都存在于 data.frame 中，则将该行分配给该 data.frame

总而言之，应该从 data.frame(a) 中选择一个随机行并检查上述条件，并在完成上述条件后将其分配给 data.frame(b) 或 (c)。因此，必须检查 data.frame(a) 的所有行的条件。

【问题讨论】：

这和here是同一个问题吗？只是为了清楚起见而进行了修改？
@Brayan Yes and No. 这是关于同一数据集的更详细的问题。上一个问题只是关于通过循环随机选择行。这是关于随机选择，然后将所选行与几个条件（如上所述）进行比较，以查看哪个条件匹配。但是，如果有人回答了这个问题，那么我之前的问题也会得到回答。
您要求人们编写大量代码，这并不符合 SO 的精神。通常您应该提供您编写的代码，但不能按照您想要的方式工作。你已经很好地表达了你的问题。开始编写处理每个步骤的代码。您的第一步是创建原始数据框和两个空数据框，然后随机选择一行。那将是一个很好的起点。顺便问一下，你会在什么时候判断随机选择的行是完整的？
我做了一些初始步骤来创建 data.frames 并通过加载 csv 文件进行填充。我最初想知道的是如何从 data.frame(a) 中随机选择一行，然后将其分配给 data.frame (b)，然后从 (a) 中选择另一行随机行，然后将其分配给 (c) 即选择来自（a）的随机行（但不应再次选择这些行）并使用循环或其他东西一一分配给（b）和（c），但没有人给我一个会一一完成的答案。一旦（a）的所有行都被覆盖，即选择并分配给（b）或（c），那么该过程应该结束。
我创建了一个 for 循环，然后从 (a) 中抽取一个随机行并将其分配给 (b)，然后再次取样并将其分配给 (c) 但 sample 命令会产生重复行。即使没有更换，当样品在循环中一次又一次地使用时，它给了我重复。我将 False 用于示例命令，但 False 对在特定运行中生成的示例适用。而且由于我只是使用 sample 来获取一行，因此甚至不需要 false ，但问题是当我多次运行相同的 sample 命令时，不能保证所选行不会重复。

标签： r random dataframe

【解决方案1】：

aCopy<-read.table("isnodes.txt")
p1<-aCopy[-c(1:nrow(aCopy)),]
p2<-aCopy[-c(1:nrow(aCopy)),]
currentRowHistory<-aCopy[-c(1:nrow(aCopy)),]

for(i in 1:nrow(a)) {
currentRow <- aCopy[sample(nrow(aCopy), 1), ]
currentRowHistory <- rbind(currentRow,currentRowHistory)
currentRowV1 <- as.character(currentRow$V1[1])
currentRowV2 <- as.character(currentRow$V2[1])
aCopy <- aCopy[!(aCopy$V1 == currentRowV1 & aCopy$V2 == currentRowV2),]

if(length(which(currentRowV1 == p1$V1)) | length(which(currentRowV1 == p1$V2))){
    if(length(which(currentRowV2 == p1$V1)) | length(which(currentRowV2 == p1$V2))){
	p1<-rbind(currentRow,p1)
        result <- "case 1 assign it to p1"
    }
    else if(length(which(currentRowV2 == p2$V1)) | length(which(currentRowV2 == p2$V2))){
	V1occurances <- length(which(p1$V1 == currentRowV1))+length(which(p1$V2==currentRowV1))
	V2occurances <- length(which(p2$V1 == currentRowV2))+length(which(p2$V2==currentRowV2))
	ifelse(V1occurances<V2occurances,p1<-rbind(currentRow,p1),p2<-rbind(currentRow,p2))
	result <- "case 2"
    }
    else {
	p1<-rbind(currentRow,p1)
        result <- "case 3 assign it to p1"
    }
} else if(length(which(currentRowV1 == p2$V1)) | length(which(currentRowV1 == p2$V2))){
    if(length(which(currentRowV2 == p2$V1)) | length(which(currentRowV2 == p2$V2))){
	p2<-rbind(currentRow,p2)
        result <- "case 1 assign it to p2"
    }
    else if(length(which(currentRowV2 == p1$V1)) | length(which(currentRowV2 == p1$V2))){
	V1occurancesInP2 <- length(which(p2$V1 == currentRowV1))+length(which(p2$V2==currentRowV1))
	V2occurancesInP1 <- length(which(p1$V1 == currentRowV2))+length(which(p1$V2==currentRowV2))
	ifelse(V1occurancesInP2<V2occurancesInP1,p2<-rbind(currentRow,p2),p1<-rbind(currentRow,p1))
        result <- "case 2"
    }
    else {
	p2<-rbind(currentRow,p2)
        result <- "case 3 assign it to p2"
    }
} else if(length(which(currentRowV2 == p1$V1)) | length(which(currentRowV2 == p1$V2))){
    p1<-rbind(currentRow,p1)
    result <- "Assign it to p1 case 3"
} else if(length(which(currentRowV2 == p2$V1)) | length(which(currentRowV2 == p2$V2))){
	p2<-rbind(currentRow,p2)
    result <- "Assign it to p2 case 3"
} else {
    ifelse(nrow(p1)<nrow(p2),p1<-rbind(currentRow,p1), p2<-rbind(currentRow,p2))

}
}

【讨论】：

【解决方案2】：

这应该可以帮助您入门。正如您所发现的，您不能不断地随机选择行，因为这会导致重复。相反，将行随机分配给一个向量，该向量给出了它们应该被处理的顺序。如果你认为这不是正确的方法，你也可以随机选择一行，然后从a 中删除它，然后再随机选择从剩下的。如果您仍需要 a，请从 a 的副本中删除该行。

set.seed(1)
dfa <- data.frame(V1 = sample(letters[1:9], replace = TRUE), V2 = sample(letters[1:9], replace = TRUE))

todo <- sample(1:nrow(dfa), nrow(dfa), replace = FALSE)

dfb <- dfa[todo[1],]
dfc <- dfa[todo[2],]

现在继续按顺序执行“待办事项”，应用您的条件并使用 rbind 向 dfb 和 dfc 添加行：

for (i in 3:length(todo)) {

    # apply your logic
    # if a row belongs in dfb, do
    dfb <- rbind(dfb, dfa[todo[i],])
    # etc
}

【讨论】：

谢谢。只是想知道您是否将问题投给-2？无论如何，非常感谢让我开始走上正轨。
不，我从不投反对票。尽管您没有代码，但您确实清楚地写出了想要的内容，这很有帮助。对我来说，画一幅画通常会有所帮助。
非常感谢。 set.seed(1) 有什么作用？我通过 SO 浏览了几个 data.frames 帖子，并在选择随机数或样本时使用 set.seed() 找到了编码器，但在 R 中，set.seed() 的帮助并没有清楚地解释它的作用，或者我无法从 R 中对 set.seed 给出的解释中理解。
是的，它有一个神秘的帮助页面。它只是确保当不同的人运行“随机”代码时，他们得到相同的答案，从而使故障排除更加容易，因为每个人看到的结果都相同。
仅供参考，在for 循环中构建对象（重新分配内存）在 R 中是不好的做法，应该不惜一切代价避免。有关背景信息，请参阅 here。