根据第二列删除重复项答案

【问题标题】：Remove duplicates based on second column根据第二列删除重复项
【发布时间】：2019-05-23 07:12:33
【问题描述】：

我正在尝试编写一段代码来做一些事情： 1）按ID分组数据集 2）计算列data.month中唯一月份的数量 3) 删除所有少于 9 个月的 ID 4）根据公司打印不同的ID（即如果与2家公司相关，则打印两次ID） 5) 删除重复ID，保留data.month数最高的记录。

我的代码工作到 5)。我无法让我的代码仅打印具有最高月份编号的重复 ID 的记录（行）。

我在这里看了几个例子：

R remove duplicates based on other columns

Remove duplicates based on 2nd column condition

我可以弄清楚如何删除重复项，但我无法将其应用于我的情况。

这是我试图实现目标的两个代码：

data.check6 <- bind %>%
group_by(bind$ABN) %>%
summarise(count = n_distinct(data.month)) %>%
filter(count>8) %>%
rrange(bind$data.month) %>%
filter(row_number() == 1)

和：

 library(tidyverse)

 data.check7 <- bind %>%
  group_by(ABN)%>%      
  filter(1 == length(unique(bind$data.month)), !duplicated(bind$data.month))

现在，我得到了错误：

arrange_impl(.data, dots) 中的错误：大小不正确 (345343) 在位置 1，期望：3749

最后我想要一个数据集，其中每个 ID 只出现一次，它是与最高月份关联的 ID 记录（即列值 = 12）

【问题讨论】：

请添加一个可重现的示例以及预期的输出。
通过您的最后一句话，我认为您不想删除重复项，否则您想在“分组依据”中应用过滤器。在这些情况下，一个可重现的例子非常重要。

标签： r filter duplicates summarize

【解决方案1】：

我认为您正在寻找类似的东西：

示例数据：

> bind <- data.frame(ABN = rep(1:3, 3),
+                    data.month = sample(1:12, 9),
+                    other.inf = runif(9))
> 
> bind
  ABN data.month other.inf
1   1         10 0.8102867
2   2          4 0.2919716
3   3          8 0.3391790
4   1          2 0.3698933
5   2          6 0.9155280
6   3          1 0.2680165
7   1          9 0.7541168
8   2          7 0.2018796
9   3         11 0.1546079

解决方案：

> bind %>%
+   group_by(ABN) %>%      
+   filter(data.month == max(data.month))
# A tibble: 3 x 3
# Groups:   ABN [3]
    ABN data.month other.inf
  <int>      <int>     <dbl>
1     1         10     0.810
2     2          7     0.202
3     3         11     0.155

【讨论】：

是的，这就是我要找的。谢谢！