在 R 中测试新的 id 组合答案

【问题标题】：Test for new id combinations in R在 R 中测试新的 id 组合
【发布时间】：2021-08-19 14:42:57
【问题描述】：

我希望创建一个指标来检查组是否采用新的数字组合。我有一个像这样的数据集：

combinations <- data.frame(combination_id = c(1, 1, 1, 1, 
                                 2, 2, 2, 
                                 3,
                                 4, 
                                 5, 5, 5, 5,
                                 6, 6, 6),

                    number = c(20, 10, 12, 18,
                                20, 10, 12,
                                20,
                                40,
                                20, 10, 30, 18,
                                18, 30, 10))

我想要的是以下内容：

dataset_2 <- data.frame(combination_id = c(1, 1, 1, 1, 
                                 2, 2, 2, 
                                 3,
                                 4, 
                                 5, 5, 5, 5,
                                 6, 6, 6),

                    number = c(20, 10, 12, 18,
                                20, 10, 12,
                                20,
                                40,
                                20, 10, 30, 18,
                                18, 30, 10),
                  new_combination = c(1, 1, 1, 1,
                                      0,0,0,
                                      0,
                                      1,
                                      1,1, 1, 1,
                                      0, 0, 0))

基本上是一个指标new_combination，如果combination_id 中的任何可能组合是新的（即不存在于combination_id 的较低值中）或者如果它只是一个未被看到的数字，则为1，如果a number 是单独的，但以前曾见过（如第 3 组中的 20），或者如果以前曾见过所有组合（如第 2 组和第 6 组）。所以第一组取值为 1，因为这些数字或组合之前都没有被取过，第 2 组取值为 0，因为所有可能的组合也在第 1 组中，第 3 组只是一个以前见过的数字，所以取值 0。第 4 组有一个新数字 (40)，因此取值为 1。第 5 组与数字 30 有新组合，因此取值为 1，第 6 组没有新组合，因此取值为 0。

我希望这能说明我在寻找什么。有任何想法吗？非常感谢。

【问题讨论】：

标签： c++ r matrix dplyr tidyverse

【解决方案1】：

library(data.table) 
setDT(combinations)
combinations[, new_combinations := ifelse(
  combination_id %in% combinations[rowid(number) == 1, combination_id], 1, 0)]
#    combination_id number new_combinations
# 1:              1     20                1
# 2:              1     10                1
# 3:              1     12                1
# 4:              1     18                1
# 5:              2     20                0
# 6:              2     10                0
# 7:              2     12                0
# 8:              3     20                0
# 9:              4     40                1
#10:              5     20                1
#11:              5     10                1
#12:              5     30                1
#13:              5     18                1
#14:              6     18                0
#15:              6     30                0
#16:              6     10                0

【讨论】：

【解决方案2】：

dplyr 方法：

require(dplyr)
combinations %>% dplyr::mutate(new_combination = !duplicated(number)) %>%
                 group_by(combination_id) %>%
                 dplyr::mutate(new_combination = as.numeric(any(new_combination))) %>%
                 ungroup()

   combination_id number new_combination
            <dbl>  <dbl>           <dbl>
 1              1     20               1
 2              1     10               1
 3              1     12               1
 4              1     18               1
 5              2     20               0
 6              2     10               0
 7              2     12               0
 8              3     20               0
 9              4     40               1
10              5     20               1
11              5     10               1
12              5     30               1
13              5     18               1
14              6     18               0
15              6     30               0
16              6     10               0

【讨论】：

【解决方案3】：

带有ave + duplicated 的基本 R 选项

transform(
  combinations,
  new_combination = ave(+!duplicated(number), combination_id, FUN = max)
)

给予

   combination_id number new_combination
1               1     20               1
2               1     10               1
3               1     12               1
4               1     18               1
5               2     20               0
6               2     10               0
7               2     12               0
8               3     20               0
9               4     40               1
10              5     20               1
11              5     10               1
12              5     30               1
13              5     18               1
14              6     18               0
15              6     30               0
16              6     10               0

【讨论】：