使用 `dplyr` 按组划分行答案

【问题标题】：Use `dplyr` to divide rows by group使用 `dplyr` 按组划分行
【发布时间】：2020-07-03 08:18:51
【问题描述】：

在我尝试学习dplyr 时，我想将每一行除以另一行，代表相应组的总数。

我用

生成了测试数据

library(dplyr)

# building test data
data("OrchardSprays")

totals <- OrchardSprays %>% group_by(treatment) %>%
  summarise(decrease = sum(decrease))
totals$decrease <- totals$decrease + seq(10, 80, 10)
totals$rowpos = totals$colpos <- "total"

df <- rbind(OrchardSprays, totals)

注意totals$decrease <- totals$decrease + seq(10, 80, 10) 行：为了这个问题，我假设每个treatment 都有一个额外的decrease，这在数据框的单行中没有观察到，而只是在“总" 每组的行。

我现在要做的是在数据框中添加另一列decrease_share，其中每行的decrease 值除以相应的treatment 组总数decrease value。

所以，对于head(df)，我希望得到这样的输出

> head(df)
  decrease rowpos colpos treatment treatment_decrease
1       57      1      1         D           0.178125
2       95      2      1         E          0.1711712
3        8      3      1         B         0.09876543
4       69      4      1         H         0.08603491
5       92      5      1         G          0.1488673
6       90      6      1         F          0.1470588

我的现实世界的例子有点复杂（更多组变量和更多级别），因此我正在dplyr 中寻找合适的解决方案。

【问题讨论】：

标签： r dplyr

【解决方案1】：

这是dplyr 的总方法：

library(dplyr) #version >= 1.0.0
OrchardSprays %>% 
  group_by(treatment) %>%
  summarise(decrease = sum(decrease)) %>%
  mutate(decrease = decrease + seq(10, 80, 10),
         rowpos = "total",
         colpos = "total") %>% 
  bind_rows(mutate(OrchardSprays, across(rowpos:colpos, as.character))) %>%
  group_by(treatment) %>%
  mutate(treatment_decrease = decrease / decrease[rowpos == "total"])
# A tibble: 72 x 5
# Groups:   treatment [8]
   treatment decrease rowpos colpos treatment_decrease
   <fct>        <dbl> <chr>  <chr>               <dbl>
 1 A               47 total  total               1    
 2 B               81 total  total               1    
 3 C              232 total  total               1    
 4 D              320 total  total               1    
 5 E              555 total  total               1    
 6 F              612 total  total               1    
 7 G              618 total  total               1    
 8 H              802 total  total               1    
 9 D               57 1      1                   0.178
10 E               95 2      1                   0.171
# … with 62 more rows

【讨论】：

谢谢你！它似乎正在工作！但是，这有点误导，您还会在答案中生成测试数据:) 如果您从答案中删除这部分，会更容易理解（对其他人也是如此）！
另外，虽然这个解决方案非常好，但我原以为decrease[rowpos == "total"] 也可以替换为（更简单的）dplyr 语句。这可能吗？
这是试图向您介绍dplyr 的附加功能。我不确定我是否同意这是“误导”。我不知道在这种情况下会替换 [ 的功能的 tidyverse 动词，因为 filter 适用于 data.frames 而不是向量。