从两个不同的行总结答案

【问题标题】：summarize from two differents rows从两个不同的行总结
【发布时间】：2021-03-30 02:08:24
【问题描述】：

这是我的开始 df

test <- data.frame(year = c(2018,2018,2018,2018,2018), 
                    source = c("file1", "file1", "file1", "file1", "file1"),
                    area = c("000", "000", "800", "800", "800"),
                    cult2 = c("PBGEX", "QPGEX", "PBGEX", "QPGEX", "QPIND"), 
                    value = c(1000,2000,3000,4000,5000))

  year source area cult2 value
1 2018  file1  000 PBGEX  1000
2 2018  file1  000 QPGEX  2000
3 2018  file1  800 PBGEX  3000
4 2018  file1  800 QPGEX  4000
5 2018  file1  800 QPIND  5000

我需要为字段 PBGEX 和 QPGEX 获取每年/来源/区域的值总和。我正在考虑使用 spread 和 gather 但我失去了许多其他列（此处未显示）。

这就是我想要的：

  year source area cult2 value
1 2018  file1  000 PBGEX  1000
2 2018  file1  000 QPGEX  2000
3 2018  file1  800 PBGEX  3000
4 2018  file1  800 QPGEX  4000
5 2018  file1  800 QPIND  5000
6 2018  file1  000 RDGEX  3000
7 2018  file1  800 RDGEX  7000

【问题讨论】：

标签： r dataframe dplyr summarize

【解决方案1】：

我们可以filter 'cult2' 是 'QPGEX'、'PBGEX' 的行，然后使用原始数据集执行 group_by sum 和 bind_rows

library(dplyr)
test %>%
    filter(cult2 %in% c("QPGEX", "PBGEX")) %>% 
    group_by(year, source, area) %>%
    summarise(cult2 = "RDGEX", value = sum(value), .groups = 'drop') %>%
    bind_rows(test, .)

-输出

#   year source area cult2 value
#1 2018  file1  000 PBGEX  1000
#2 2018  file1  000 QPGEX  2000
#3 2018  file1  800 PBGEX  3000
#4 2018  file1  800 QPGEX  4000
#5 2018  file1  800 QPIND  5000
#6 2018  file1  000 RDGEX  3000
#7 2018  file1  800 RDGEX  7000

如果我们需要proportion 列

test %>%
 filter(cult2 %in% c("QPGEX", "PBGEX")) %>% 
 group_by(year, source, area) %>%
 group_by(prop = value[cult2== "QPGEX"]/value[cult2 == "PBGEX"],
        .add = TRUE) %>% 
 summarise(cult2 = "RDGEX", value = sum(value), .groups = 'drop') %>% 
 bind_rows(test, .)

也可以

library(tidyr)
test %>% 
   filter(cult2 %in% c("QPGEX", "PBGEX")) %>%
   pivot_wider(names_from = cult2, values_from = value) %>% 
   # or use spread
   #spread(cult2, value) %>%
   mutate(prop = QPGEX/PBGEX) %>% 
   select(-PBGEX, -QPGEX) %>%
   right_join(test)

-输出

# A tibble: 5 x 6
#   year source area   prop cult2 value
#  <dbl> <chr>  <chr> <dbl> <chr> <dbl>
#1  2018 file1  000    2    PBGEX  1000
#2  2018 file1  000    2    QPGEX  2000
#3  2018 file1  800    1.33 PBGEX  3000
#4  2018 file1  800    1.33 QPGEX  4000
#5  2018 file1  800    1.33 QPIND  5000

【讨论】：

谢谢，这很好，但我也需要用除法而不是总和进行相同的计算，结果应该是 QPGEX/PBGEX 的结果......也许你知道该怎么做这个？
@krifur 你能检查我的更新吗？这是预期的
oups 抱歉忘了说，我的 R 版本不是最新的，我不能使用 pivot_wider :/edit: 已经有两个解决方案，好的，我正在检查这个，再次感谢您的帮助
@krifur 你可以把它改成spread，用注释行更新