如何根据多个标准dplyr确定唯一值的数量答案

【问题标题】：how to determine the number of unique values based on multiple criteria dplyr如何根据多个标准dplyr确定唯一值的数量
【发布时间】：2022-08-03 20:58:28
【问题描述】：

我有一个看起来像的 df：

df(站点=c(A,B,C,D,E)，物种=c(1,2,3,4)，年份=c(1980:2010)。

我想计算每个物种在每个站点中出现的不同年份的数量，创建一个名为 nYear 的新列，我尝试按组过滤并使用 mutate 结合不明确的值，但它不太有效。

这是我一直在使用的部分代码：

Df1 <- Df %>%
  filter(Year>1985)%>%
  mutate(nYear = n_distinct(Year[Year %in% site]))%>%
  group_by(Species,Site, Year) %>% 
  arrange(Species, .by_group=TRUE) 
  ungroup()

Any help would be welcome.

Thanks!

标签： r dplyr tidyverse filtering unique

【解决方案1】：

方法很好，需要纠正一些事情。

首先，让我们制作一些可重现的数据（您的代码给出了错误）。

df <- data.frame("site"=LETTERS[1:5], "species"=1:5, "Year"=1981:2010)

当您要跨组汇总值时，您应该使用 summarise 而不是 mutate。它将为您提供缩短的tibble 作为输出，仅显示组和摘要数字（更少的列和行）。

另一方面，mutate 旨在修改现有的tibble，默认情况下保留所有行和列。

链中函数的顺序也需要改变。

df %>%
  filter(Year>1985) %>%
  group_by(species,site) %>% 
  summarise(nYear = length(unique(Year))) %>% # instead of mutate
  arrange(species, .by_group=TRUE) %>% 
ungroup()

首先是group_by(species,site)，不是年份，然后是summarise 和arrange。

# A tibble: 5 × 3
  species site  nYear
    <int> <chr> <int>
1       1 A         5
2       2 B         5
3       3 C         5
4       4 D         5
5       5 E         5

【讨论】：

【解决方案2】：

您可以在过滤后的帧上使用distinct()，然后按您感兴趣的组计数：

distinct(Df %>% filter(Year>1985)) %>%
  count(Site, Species,name = "nYear")

【讨论】：