数据整理以添加汇总映射值 R 计数的列答案

【问题标题】：Data wrangling to add columns that sum up counts of mapped values R数据整理以添加汇总映射值 R 计数的列
【发布时间】：2021-07-23 17:09:00
【问题描述】：

我有一个每个人的计数数据框，如下所示：

Person_ID Apple Pear Chicken Steak Spinach
   1        1    0      0      5      1
   2        1    1      1      0      0
   3        0    0      0      3      2

我有另一个数据框，它映射了哪些食物属于哪个食物组，如下所示：

Food     Group
Apple    Fruit
Pear     Fruit
Chicken  Meat
Steak    Meat
Spinach  Vegetable

我想用第二个数据框在第一个添加新列，基本上是创建代表食物组的新列并根据其组成列的总和收集计数，因此最终输出如下所示：

Person_ID Apple Pear Chicken Steak Spinach Fruit Meat Vegetable
   1        1    0      0      5      1      1    5       1
   2        1    1      1      0      0      2    1       0
   3        0    0      0      3      2      0    3       2

我无法以干净的方式执行此操作，而且看起来相当复杂。我想知道是否有一个简单的解决方案，并希望得到任何解决方案的建议

【问题讨论】：

标签： r dataframe

【解决方案1】：

我们只需要赋值，即选择 'df1' 列的子集和 'df2' 的 'Food' 列，split 那些具有 'Group' 列的子集到 list，获取 rowSums 并分配那些根据“组”列值在“df1”中创建新列

m1 <- sapply(split.default(df1[df2$Food], df2$Group), rowSums)
df1[colnames(m1)] <- m1

-输出

df1
  Person_ID Apple Pear Chicken Steak Spinach Fruit Meat Vegetable
1         1     1    0       0     5       1     1    5         1
2         2     1    1       1     0       0     2    1         0
3         3     0    0       0     3       2     0    3         2

数据

df1 <- structure(list(Person_ID = 1:3, Apple = c(1L, 1L, 0L), Pear = c(0L, 
1L, 0L), Chicken = c(0L, 1L, 0L), Steak = c(5L, 0L, 3L), Spinach = c(1L, 
0L, 2L)), class = "data.frame", row.names = c(NA, -3L))

df2 <- structure(list(Food = c("Apple", "Pear", "Chicken", "Steak", 
"Spinach"), Group = c("Fruit", "Fruit", "Meat", "Meat", "Vegetable"
)), class = "data.frame", row.names = c(NA, -5L))

【讨论】：

这真的很优雅！如果我没记错的话，我们在 tidyverse 中没有与 split.default 等效的对象。
@AnoushiravanR 谢谢，group_split 中应该有类似的内容，即group_split.default 等。

【解决方案2】：

您也可以使用以下tidyverse解决方案：

library(dplyr)
library(tidyr)

df1 %>%
  left_join(df1 %>%
              pivot_longer(!Person_ID, names_to = "Food") %>%
              left_join(df2, by = "Food") %>%
              group_by(Person_ID, Group) %>%
              summarise(sum = sum(value), .groups = "drop") %>%
              pivot_wider(names_from = Group, values_from = sum), by = "Person_ID")

  Person_ID Apple Pear Chicken Steak Spinach Fruit Meat Vegetable
1         1     1    0       0     5       1     1    5         1
2         2     1    1       1     0       0     2    1         0
3         3     0    0       0     3       2     0    3         2

【讨论】：

为什么不映射？ :-)
哈哈是的，我昨天尝试了pamp，但认为只会变得更加困难。

【解决方案3】：

您也可以在这里使用purrr::reduce

library(tidyverse)

reduce(unique(df2$Group), 
       .init = df1,
       ~ .x %>% 
         mutate(!!.y := rowSums(df1[names(df1) %in% df2$Food[df2$Group == .y]]))
       )

#>   Person_ID Apple Pear Chicken Steak Spinach Fruit Meat Vegetable
#> 1         1     1    0       0     5       1     1    5         1
#> 2         2     1    1       1     0       0     2    1         0
#> 3         3     0    0       0     3       2     0    3         2

或者等价于baseR

Reduce(function(x, y) {x[[y]] <- rowSums(x[names(df1) %in% df2$Food[df2$Group == y]])
  x}, init = df1, unique(df2$Group))

  Person_ID Apple Pear Chicken Steak Spinach Fruit Meat Vegetable
1         1     1    0       0     5       1     1    5         1
2         2     1    1       1     0       0     2    1         0
3         3     0    0       0     3       2     0    3         2

【讨论】：

精彩绝伦！