【发布时间】:2021-04-08 18:54:01
【问题描述】:
我有一个示例tbl_df,我正在尝试寻找解决方案。我正在尝试在高水平上执行以下操作。将学生在 2021 年的最高分数(基于他们拥有最多的类型计数)与他们在 2021 年之前最近一年的type 的最新结果进行比较。我想使用dplyr::filter,但可以'不知道如何正确地 filter 保留 tbl_df 以获取我的输出。
简而言之:
- 按
full_name分组,然后在count列中选择type具有max值的行,用于2021 年 - 为同一
type选择下一个最近的年份
如你所见,由于埃里克·柯林斯在 2020 年没有排行,他最近的一年是 2019 年,而其他的有 2020 年的值。
示例:
sample_df <- tibble::tribble(
~year, ~full_name, ~type, ~count, ~avg_score, ~max,
2021L, "Jason Valdez", "Sciences", "33", 98, 99,
2021L, "Jason Valdez", "Humanities", "59", 97, 99,
2020L, "Jason Valdez", "Sciences", "164", 97, 99,
2020L, "Jason Valdez", "Humanities", "231", 96, 98,
2019L, "Jason Valdez", "Sciences", "933", 96, 99,
2019L, "Jason Valdez", "Humanities", "853", 95, 99,
2021L, "Eric Collins", "Sciences", "21", 92, 93,
2019L, "Eric Collins", "Sciences", "831", 94, 97,
2019L, "Eric Collins", "Humanities", "10", 94, 97,
2021L, "Sebastian Goldberg", "Sciences", "41", 93, 96,
2020L, "Sebastian Goldberg", "Sciences", "476", 94, 98,
2020L, "Sebastian Goldberg", "Humanities", "81", 93, 96,
2019L, "Sebastian Goldberg", "Sciences", "1418", 95, 98
)
output_df <- tibble::tribble(
~year, ~full_name, ~type, ~count, ~avg_score, ~max,
2021L, "Jason Valdez", "Humanities", 59L, 95L, 96L,
2020L, "Jason Valdez", "Humanities", 231L, 96L, 98L,
2021L, "Eric Collins", "Sciences", 21L, 92L, 93L,
2019L, "Eric Collins", "Sciences", 831L, 94L, 97L,
2021L, "Sebastian Goldberg", "Sciences", 41L, 93L, 96L,
2020L, "Sebastian Goldberg", "Sciences", 476L, 94L, 98L
)
【问题讨论】: