R中的组合组合答案

【问题标题】：combinations of combinations in RR中的组合组合
【发布时间】：2016-11-15 01:59:31
【问题描述】：

假设我有两个向量

upVariables<-c("up1", "up2", "up3", "up4", "up5")
downVariables<-c("down1", "down2", "down3", "down4", "down5")

这些中的每一个都将用于在另一个向量中查找一个数字。我正在寻找所有可能的两个比率集（所有可能的四个变量集，每个向量两个），其中分子始终来自 upVariables，分母始终来自 downVariables，最终集合不使用相同的变量两次。

我已经做到了

upCombos<-combn(upVariables,2)
downCombos<-combn(downVariables,2)
combos<-arrange(expand.grid(upCombos=upCombos[,1],downCombos=downCombos[,1]),upCombos)

我在这里只使用第一个可能的组合来说明，但我想遍历所有可能的组合。这给了我：

> combos
  upCombos downCombos
1      up1      down1
2      up1      down2
3      up2      down1
4      up2      down2

不过，我想从中制作两套，比如：

> combos[1]
  upCombos downCombos
1      up1      down1
2      up2      down2

和

> combos[2]
  upCombos downCombos
1      up1      down2
2      up2      down1

因此，在每种情况下，upCombos 中的每个值都只使用一次，downCombos 中的每个值只使用一次。那有意义吗？关于如何解决这个问题的任何想法？

理想情况下，我希望能够推广到从原始向量中采样的 3 个集合，而不是 2 个集合，但我很高兴现在让 2 个集合起作用。

** 编辑因此，Jota 提供了一个解决方案，该解决方案提供了任何一组 4 个变量（2 个来自 upVariables，2 个来自 downVariables）的安排。不过，我仍然看不到我如何遍历所有可能的 4 个变量集。我最接近的方法是将 Jota 的建议放在两个 for 循环中（发现尚未 R 程序员）。这将返回比应有的更少的组合。

n<-2
offset<-n-1
for (i in 1:(length(upVariable)-offset)){
  for (j in 1:(length(downVariables)-offset)){
    combos <- expand.grid(upVariables[i:(i+offset)], downVariables[j:(j+offset)])
    combos <- combos[with(combos, order(Var1)), ]  # use dplyr::arrange if you prefer
    mat <- matrix(1:n^2, byrow = TRUE, nrow = n)
    for(j in 2:nrow(mat) ) mat[j, ] <- mat[j, c(j:ncol(mat), 1:(j - 1))]
      pairs<-(split(combos[c(mat), ], rep(1:n, each = n)))
     collapsed<-sapply(lapply(pairs, apply, 1, paste, collapse = '_'), paste, collapse = '-')
      ratioGroups<-c(ratioGroups,collapsed)
  }
}

这仅返回 16 组变量（每组有 2 个组合，因此总共 32 个）。但是，每组有 5 个变量，还有更多的可能性。

【问题讨论】：

标签： r

【解决方案1】：

您可以使用expand.grid 创建组合并准备子集用正则表达式

upVariables<-c("up1", "up2", "up3", "up4", "up5")
downVariables<-c("down1", "down2", "down3", "down4", "down5")

DF = expand.grid(upVariables,downVariables)

DF$suffix1 = as.numeric(unlist(regmatches(DF$Var1,gregexpr("[0-9]+",DF$Var1))))

DF$suffix2 = as.numeric(unlist(regmatches(DF$Var2,gregexpr("[0-9]+",DF$Var2))))

head(DF)
#  Var1  Var2 suffix1 suffix2
#1  up1 down1       1       1
#2  up2 down1       2       1
#3  up3 down1       3       1
#4  up4 down1       4       1
#5  up5 down1       5       1
#6  up1 down2       1       2



DF_Comb1 = DF[DF$suffix1==DF$suffix2,]
DF_Comb2 = DF[DF$suffix1!=DF$suffix2,]

DF_Comb1
#    Var1  Var2 suffix1 suffix2
# 1   up1 down1       1       1
# 7   up2 down2       2       2
# 13  up3 down3       3       3
# 19  up4 down4       4       4
# 25  up5 down5       5       5


head(DF_Comb2)
  # Var1  Var2 suffix1 suffix2
# 2  up2 down1       2       1
# 3  up3 down1       3       1
# 4  up4 down1       4       1
# 5  up5 down1       5       1
# 6  up1 down2       1       2
# 8  up3 down2       3       2

【讨论】：

【解决方案2】：

所以我想我可能已经破解了它。我已经掠夺了其他问题的几个答案。有一个名为 expand.grid.unique 的函数 here 如果您将相同的向量放入 expand.grid 两次，它会删除重复项。还有一个here，叫做expand.grid.df，我什至不会假装理解哪个扩展expand.grid 来处理数据帧。然而，结合起来，他们会做我想让他们做的事情。

upVariables<-c("up1", "up2", "up3", "up4", "up5")
downVariables<-c("down1", "down2", "down3", "down4", "down5")
ratioGroups<-data.frame(matrix(ncol=2, nrow=0))
colnames(ratioGroups)<-c("mix1","mix2")

ups<-expand.grid.unique(upVariables,upVariables)
downs<-expand.grid.unique(downVariables,downVariables)
comboList<-expand.grid.df(ups,downs)
comboList <- data.frame(lapply(comboList, as.character), stringsAsFactors=FALSE)
colnames(comboList)<-c("u1","u2","d1","d2")

因为某些原因，所有内容都被转换为因子，所以在那里将所有内容都转换回字符串有很多麻烦。

如果我将 Jota 的答案放入函数中：

getGroups<-function(line){
  n<-2 #the number ratios being used.
  combos <- expand.grid(as.character(line[1:2]), as.character(line[3:4]))
  combos <- combos[with(combos, order(Var1)), ]  # use dplyr::arrange if you prefer
  mat <- matrix(1:n^2, byrow = TRUE, nrow = n)
  for(j in 2:nrow(mat) ) mat[j, ] <- mat[j, c(j:ncol(mat), 1:(j - 1))]
  pairs<-(split(combos[c(mat), ], rep(1:n, each = n)))
  collapsed<-sapply(lapply(pairs, apply, 1, paste, collapse = '_'), paste, collapse = '-')
}

然后我可以使用

ratiosGroups<-as.vector(apply(comboList,1,getGroups))

返回所有可能组合的列表。我猜这仍然不是实现我更大目标的最佳方式，但它已经实现了。

【讨论】：

【解决方案3】：

这是我对 cmets 和编辑后的问题的回答。

# create combos and order them according to the first variable
combos <- expand.grid(upVariables[1:2], downVariables[1:2])
combos <- combos[with(combos, order(Var1)), ]  # use dplyr::arrange if you prefer
# if names are important, set them:
# names(combos) <- c("upCombos", "downCombos")

# create a matrix to use to sort combos
mat <- matrix(1:2^2, byrow = TRUE, nrow = 2)
# take some code from Carl Witthoft to shift the above matrix
# from: http://stackoverflow.com/a/24144632/640595
for(j in 2:nrow(mat) ) mat[j, ] <- mat[j, c(j:ncol(mat), 1:(j - 1))]

# use the matrix to sort combos, and then conduct the splitting
initialResult <- split(combos[c(mat), ], rep(1:2, each = 2))

$`1`
  Var1  Var2
1  up1 down1
4  up2 down2

$`2`
  Var1  Var2
3  up1 down2
2  up2 down1

要生成其余的组合，我们可以遍历并替换 up 变量和 down 变量：

# use regular expressions with the stringi package to produce the rest of the combinations.
library(stringi)
# convert from factor to character for easier manipulation
initialResult <- lapply(initialResult, sapply, as.character)

# iterate through the columns of upCombos
intermediateResult <- lapply(seq_len(dim(upCombos)[2]), 
    function(ii) {
        jj <- stri_replace_all_fixed(unlist(initialResult), 
            pattern = c("up1", "up2"), 
            replacement = c(upCombos[, ii]))
        relist(jj, initialResult)})

# iterate through columns of downCombos
finalResult <- lapply(seq_len(dim(downCombos)[2]), 
    function(ii) {
        jj <- stri_replace_all_fixed(unlist(intermediateResult), 
            pattern = c("down1", "down2"), 
            replacement = c(downCombos[, ii]), vectorize_all = FALSE)
        relist(jj, intermediateResult)})

【讨论】：

这样可以获取任何给定变量集的所有组合。如何更改它以迭代 up1:up5 和 down2:down5 的所有可能组合？我的第一个想法是在两个 for 循环中，即 for (i in 1:(length(upVariables)-offset)){ 等。这似乎 a) 不太像 R - 我确信有更好的方法来做到这一点b) 它似乎并没有像我想象的那样产生尽可能多的组合。
我指的是我的问题顶部的位，我提到我有两组变量，upVariables 和 downVariables，我想获得 4 个变量的所有可能组合（每个变量 2 个））。我使用第一组四个来演示我想对每个单独的集合做什么。您的答案非常适合用每组 4 来安排变量。我想我可以扩展它以迭代所有可能的集合，但似乎无法做到。我会回去看看我是否可以更好地表达这个问题。
我……可能有东西。将发布答案。