【发布时间】:2020-11-28 22:15:37
【问题描述】:
我有一个如下所示的数据框:
df <- tibble(date = c(2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01),
site = c("X", "X", "X", "X", "Z", "Z", "Z", "Z"),
treatment = c("a", "a", "b", "b", "a", "a", "b", "b"),
species = c("vetch", "clover", "vetch", "clover", "vetch", "clover", "vetch", "clover"),
frequency = c(0, 1, 1, 1 1, 0, 1, 0))
但有很多日期、地点和治疗方法。我想要的是过滤掉该站点的该物种的所有频率(在所有处理和日期中)为 0 的观察值。因此,在上面我想删除站点“Z”的三叶草,因为它在该站点的任何处理或日期都没有发生,但我想将三叶草留在站点“X”,因为它确实发生在其中一种处理中。所以我想要:
tibble(date = c(2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01),
site = c("X", "X", "X" "X", "Z", "Z"),
treatment = c("a", "a", "b", "b", "a", "b"),
species = c("vetch", "clover", "vetch", "clover", "vetch", "vetch")
frequency = c(0, 1, 1, 1, 1, 1))
我的第一个想法是pivot_wider,选择列然后再次pivot_longer,但这不起作用,因为在站点“X”中仍然选择了三叶草列:
df %>%
pivot_wider(names_from = species, names_prefix = "spp.", values_from = frequency, values_fill = 0) %>%
group_by(site) %>%
select_if(~ !is.numeric(.) || sum(.) != 0) %>%
pivot_longer(starts_with("spp."), names_to = "species", names_prefix = "spp.", values_to = "frequency") -> df
所以我想我需要过滤,但我不知道该怎么做。
【问题讨论】:
标签: r dataframe dplyr filtering