【发布时间】:2020-01-31 20:22:15
【问题描述】:
我正在分析一个包含生态数据的数据库。这是一个例子:
df <- data.frame(observationID = c("06a4dcc1-a2c1-f1a9-3964-4374c3a26e2a","b8431c2b-fa18-42bf-b2c9-3dc23d308b44","b8431c2b-fa18-42bf-b2c9-3dc23d308b44","ff8a8b93-f307-4695-ad95-1915c2c46c60","ff8a8b93-f307-4695-ad95-1915c2c46c60","c240564d-a100-4cdb-8a81-8ac197a45e8b","c240564d-a100-4cdb-8a81-8ac197a45e8b","f0a18902-fd16-4d82-bc3a-10bd47454dff","f0a18902-fd16-4d82-bc3a-10bd47454dff","f0a18902-fd16-4d82-bc3a-10bd47454dff"),
animalVernacularName = c("wild boar","Horse","Horse","Horse","Horse","Common Buzzard","Common Buzzard","wild boar","wild boar","Fox"),
behav = c("1","1","2","1","2","1","1","1","1","2"),
value = c("Passing","Interest","Intraspecific interaction","Interest","Intraspecific interaction","Interest","Intraspecific interaction","Eating","Intraspecific interaction","Eating"))
我想根据两个变量(“observationID”和“behav”)识别重复项,然后找到这些重复项的“observationID”值,并删除所有具有该“observationID”值的案例。不仅是两个重复项之一,而且所有具有“observationID”的案例(可以有更多的案例,而不仅仅是重复项)。我需要删除具有此“观察 ID”的所有案例,因为整个观察(由多个案例组成)输入错误。
仅识别重复项不是问题,但也需要让 R 给我这些重复项的 'observationID' 值。
有一些简单的方法可以在两列中查找重复项。比如我试过
dupe <- duplicated(df[c("observationID","behav")])
这里它标识了重复项,但我没有看到如何找到相应的“observationID”值的选项。
这样做
test <- pivot_wider(df, names_from = behav, values_from = value, names_prefix = "behav", values_fn = list(value = length))
我确实找到了重复项并看到了相应的“observationID”,但我找不到让 R 返回这些值的方法,所以我可以删除观察结果。
我正在寻找一种方法,让 R 向我返回一个“observationID”列表,即根据“observationID”和“behav”列找到的重复项的值。在这个例子中,我正在寻找一种方法来删除所有带有“observationID”的案例:
"c240564d-a100-4cdb-8a81-8ac197a45e8b"
"f0a18902-fd16-4d82-bc3a-10bd47454dff"
然后我可以将这个列表用于我的数据集的 filter()。
所以最终,我希望得到以下结果。
df_result <- data.frame(observationID = c("06a4dcc1-a2c1-f1a9-3964-4374c3a26e2a","b8431c2b-fa18-42bf-b2c9-3dc23d308b44","b8431c2b-fa18-42bf-b2c9-3dc23d308b44","ff8a8b93-f307-4695-ad95-1915c2c46c60","ff8a8b93-f307-4695-ad95-1915c2c46c60"),
animalVernacularName = c("wild boar","Horse","Horse","Horse","Horse"),
behav = c("1","1","2","1","2"),
value = c("Passing","Interest","Intraspecific interaction","Interest","Intraspecific interaction"))
【问题讨论】:
-
您在寻找
df$observationID[dupe]吗?
标签: r filter duplicates