使用 dplyr 根据每个组中唯一出现的总数给出一个 ID答案

【问题标题】：Give an ID based on the total number of unique appearances in each group using dplyr使用 dplyr 根据每个组中唯一出现的总数给出一个 ID
【发布时间】：2021-08-24 15:12:07
【问题描述】：

我一直在努力解决这个问题，希望得到您的指导和帮助我有一个看起来像这样的data.frame

col1 <- c("a","a","b", "a","b","c","a","c","d")
replicate <- c("rep1","rep1","rep1","rep2","rep2","rep2","rep3","rep3","rep3")
df = data.frame(col1, replicate)

  col1 replicate
1    a      rep1
2    a      rep1
3    b      rep1
4    a      rep2
5    b      rep2
6    c      rep2
7    a      rep3
8    c      rep3
9    d      rep3

我想创建另一列，其中包含每个元素的次数的 col1 出现在 replicate 列中，但我不想考虑每个复制中的重复项。我希望我的数据看起来像这样

  col1 replicate  ID
1    a      rep1  3
2    a      rep1  3
3    b      rep1  2
4    a      rep2  3
5    b      rep2  2
6    c      rep2  2
7    a      rep3  3
8    c      rep3  2
9    d      rep3  1

这是因为“a”出现在所有 3 个重复中 “b”存在于 rep1 和 rep2 rep2 和 rep3 中的“c” 而“d”只在 rep3 中

【问题讨论】：

使用n_distinct()

标签： r dplyr data.table tidyverse tidyr

【解决方案1】：

df %>% group_by(col1) %>%
  mutate(ID = n_distinct(col1, replicate))

# A tibble: 9 x 3
# Groups:   col1 [4]
  col1  replicate    ID
  <chr> <chr>     <int>
1 a     rep1          3
2 a     rep1          3
3 b     rep1          2
4 a     rep2          3
5 b     rep2          2
6 c     rep2          2
7 a     rep3          3
8 c     rep3          2
9 d     rep3          1

【讨论】：

【解决方案2】：

使用uniqueN

library(data.table)
setDT(df)[, ID := uniqueN(paste(col1, replicate)), col1]

-输出

df
   col1 replicate ID
1:    a      rep1  3
2:    a      rep1  3
3:    b      rep1  2
4:    a      rep2  3
5:    b      rep2  2
6:    c      rep2  2
7:    a      rep3  3
8:    c      rep3  2
9:    d      rep3  1

【讨论】：