在 R 中创建列的唯一组合的 df，其中顺序无关紧要答案

【问题标题】：Creating a df of unique combinations of columns in R where order doesn't matter在 R 中创建列的唯一组合的 df，其中顺序无关紧要
【发布时间】：2020-01-26 23:27:18
【问题描述】：

我想创建一个包含三列的所有唯一组合的 df，其中值的顺序无关紧要。在我的示例中，我想创建三个人可能拥有的所有意识形态群体组合的列表。

在我的示例中，“No opinion”、“Moderate”、“Conservative”与“Conservative”“No opinion”“Moderate”等同于“Moderate”、“No opinion”、“Conservative”等。所有这些组合都应该用一行来表示。

我看到过类似的threads 将distinct 用于主队和客队运动队，但我认为这不适用于这个问题。

library(tidyverse)

political_spectrum_values = 
  factor(c("Far left",
           "Liberal",
           "Moderate", 
           "Conservative",
           "Far right",
           "No opinion"), 
           ordered = T)


political_groups_of_3 <- 
crossing(first_person = political_spectrum_values, 
         second_person = political_spectrum_values,
         third_person = political_spectrum_values)

我考虑过通过管道进入这一行来制作某种组合变量，但我不知道如何从这里获取它

unite(col = "group_composition", c(first_person, second_person, third_person), sep = "_")

编辑：在处理这个问题的时间更长后，我以一种可能使这更容易的方式重塑了数据

crossing(first_person = political_spectrum_values, 
         second_person = political_spectrum_values,
         third_person = political_spectrum_values) %>% 
  mutate(group_n = row_number()) %>% 
  pivot_longer(cols = c(first_person, second_person, third_person), 
               values_to = "ideology", 
               names_to = "group") %>% 
  select(-group)

【问题讨论】：

我对预期输出持开放态度，另请参阅我的上次编辑

标签： r combinations tidyverse

【解决方案1】：

这是一个你可以使用的技巧。不要从政治倾向的名称开始，而是从数字 5^(0:5) 开始。请注意，任何长度为 3 的组合的总和都是唯一的，因为 3 乘以 5^x 小于 5^(x+1)。因此，如果您在三个这样的向量上运行expand.grid（相当于crossing）并获取行和，那么唯一和的位置将与crossing 结果中的唯一名称组合的位置相同.

所以你可以只做这个单行：

political_groups_of_3[!duplicated(rowSums(expand.grid(5^(0:5), 5^(0:5), 5^(0:5)))), ]

给出：

#> # A tibble: 56 x 3
#>    first_person second_person third_person
#>    <ord>        <ord>         <ord>       
#>  1 Conservative Conservative  Conservative
#>  2 Conservative Conservative  Far left    
#>  3 Conservative Conservative  Far right   
#>  4 Conservative Conservative  Liberal     
#>  5 Conservative Conservative  Moderate    
#>  6 Conservative Conservative  No opinion  
#>  7 Conservative Far left      Far left    
#>  8 Conservative Far left      Far right   
#>  9 Conservative Far left      Liberal     
#> 10 Conservative Far left      Moderate    
#> # ... with 46 more rows

这是“更优雅”还是只是一个不透明的hack当然是品味问题......

【讨论】：

【解决方案2】：

基本的 R 方法是使用 expand.grid、sort 逐行创建 political_spectrum_values 的所有组合，一次取 3 个并选择唯一行。

df <- expand.grid(first_person = political_spectrum_values, 
                  second_person = political_spectrum_values, 
                  third_person = political_spectrum_values)

df[] <- t(apply(df, 1, sort))
unique(df)

如果需要作为单个字符串

unique(apply(df, 1, function(x) paste0(sort(x), collapse = "_")))

【讨论】：

【解决方案3】：

这是使用gtools::combinations 和paste 的两步解决方案。

library(gtools)
#Get all combinations with repeats for the political_spectrum_values in groups of 3
combs<-combinations(nlevels(political_spectrum_values),
                            3,
                            as.character(political_spectrum_values),
                            repeats = T)
#Collapse each row in a single entry and convert it into a data.frame
combs<-data.frame(group_composition = apply(combs, 
                                            1, 
                                            function(x) paste(x, collapse = "_")))

【讨论】：

【解决方案4】：

这是一个结合使用更新和unite 的答案。我会让这个开放时间稍长一点，以防万一有人有更优雅的解决方案

crossing(first_person = political_spectrum_values, 
         second_person = political_spectrum_values,
         third_person = political_spectrum_values) %>% 
  mutate(group_n = row_number()) %>% 
  pivot_longer(cols = c(first_person, second_person, third_person), 
               values_to = "ideology", 
               names_to = "group") %>% 
  select(-group) %>%
  group_by(group_n) %>% 
  arrange(ideology) %>% 
  mutate(person = row_number()) %>% 
  pivot_wider(id_cols = group_n, values_from = ideology, names_from = person) %>% 
  unite(col = "group_composition", c(`1`, `2`, `3`), sep = "_") %>% 
  ungroup() %>% 
  distinct(group_composition)

【讨论】：