仅在 4 列中的 3 列中保留唯一行 [重复]答案

【问题标题】：Leave only unique rows in 3 columns out of 4 [duplicate]仅在 4 列中的 3 列中保留唯一行 [重复]
【发布时间】：2020-12-23 23:52:29
【问题描述】：

我有一个数据框：

Date                      ID     Type    Value
2020-08-04 03:00:00        1    active     14
2020-08-04 03:00:00        1    active     15
2020-08-04 03:00:00        2    active     16
2020-08-04 03:00:00        2    passive     17

我想删除日期 ID 类型列中具有相同值的行。所以想要的结果是：

Date                      ID     Type    Value
2020-08-04 03:00:00        1    active     14
2020-08-04 03:00:00        2    active     16
2020-08-04 03:00:00        2    passive     17

如您所见，第二行消失了。我怎么能这样做？

【问题讨论】：

如果你的数据框被称为df，那么df[-which(duplicated(df[,c("Date", "ID", "Type")), ]应该可以工作。

标签： r dataframe

【解决方案1】：

我建议使用paste() 创建一个像这样的全局ID，然后使用duplicated()：

#Code
mdf[duplicated(mdf$Date,mdf$ID,mdf$Type,fromLast = F),]

输出：

                Date ID    Type Value
2 04/08/2020 3:00:00  1  active    15
3 04/08/2020 3:00:00  2  active    16
4 04/08/2020 3:00:00  2 passive    17

使用的一些数据：

#Data
mdf <- structure(list(Date = c("04/08/2020 3:00:00", "04/08/2020 3:00:00", 
"04/08/2020 3:00:00", "04/08/2020 3:00:00"), ID = c(1L, 1L, 2L, 
2L), Type = c("active", "active", "active", "passive"), Value = 14:17), row.names = c(NA, 
-4L), class = "data.frame")

【讨论】：

【解决方案2】：

如果您的目标是保持给定ID 的最小值，您可以使用此dplyr 解决方案：

mdf %>% 
  group_by(Date, ID, Type) %>% 
  mutate(Value = min(Value)) %>% 
  unique()

这给了我们：

  Date                  ID Type    Value
  <chr>              <int> <chr>   <int>
1 04/08/2020 3:00:00     1 active     14
2 04/08/2020 3:00:00     2 active     16
3 04/08/2020 3:00:00     2 passive    17

【讨论】：

【解决方案3】：

使用dplyr

tble = read.table(text='
S.no Date                      ID     Type    Value
1 2020-08-04 03:00:00        1    active     14
2 2020-08-04 03:00:00        1    active     15
3 2020-08-04 03:00:00        2    active     16
4 2020-08-04 03:00:00        2    passive     17')

library(dplyr)

tble %>% distinct(Date, ID, Type, .keep_all=TRUE)
#>         S.no     Date ID    Type Value
#> 1 2020-08-04 03:00:00  1  active    14
#> 3 2020-08-04 03:00:00  2  active    16
#> 4 2020-08-04 03:00:00  2 passive    17

^{由reprex package (v0.3.0) 于 2020 年 9 月 4 日创建}

【讨论】：