【问题标题】:Data wrangling to add columns that sum up counts of mapped values R数据整理以添加汇总映射值 R 计数的列
【发布时间】:2021-07-23 17:09:00
【问题描述】:

我有一个每个人的计数数据框,如下所示:

Person_ID Apple Pear Chicken Steak Spinach
   1        1    0      0      5      1
   2        1    1      1      0      0
   3        0    0      0      3      2

我有另一个数据框,它映射了哪些食物属于哪个食物组,如下所示:

Food     Group
Apple    Fruit
Pear     Fruit
Chicken  Meat
Steak    Meat
Spinach  Vegetable

我想用第二个数据框在第一个添加新列,基本上是创建代表食物组的新列并根据其组成列的总和收集计数,因此最终输出如下所示:

Person_ID Apple Pear Chicken Steak Spinach Fruit Meat Vegetable
   1        1    0      0      5      1      1    5       1
   2        1    1      1      0      0      2    1       0
   3        0    0      0      3      2      0    3       2

我无法以干净的方式执行此操作,而且看起来相当复杂。我想知道是否有一个简单的解决方案,并希望得到任何解决方案的建议

【问题讨论】:

    标签: r dataframe


    【解决方案1】:

    我们只需要赋值,即选择 'df1' 列的子集和 'df2' 的 'Food' 列,split 那些具有 'Group' 列的子集到 list,获取 rowSums 并分配那些根据“组”列值在“df1”中创建新列

    m1 <- sapply(split.default(df1[df2$Food], df2$Group), rowSums)
    df1[colnames(m1)] <- m1
    

    -输出

    df1
      Person_ID Apple Pear Chicken Steak Spinach Fruit Meat Vegetable
    1         1     1    0       0     5       1     1    5         1
    2         2     1    1       1     0       0     2    1         0
    3         3     0    0       0     3       2     0    3         2
    

    数据

    df1 <- structure(list(Person_ID = 1:3, Apple = c(1L, 1L, 0L), Pear = c(0L, 
    1L, 0L), Chicken = c(0L, 1L, 0L), Steak = c(5L, 0L, 3L), Spinach = c(1L, 
    0L, 2L)), class = "data.frame", row.names = c(NA, -3L))
    
    df2 <- structure(list(Food = c("Apple", "Pear", "Chicken", "Steak", 
    "Spinach"), Group = c("Fruit", "Fruit", "Meat", "Meat", "Vegetable"
    )), class = "data.frame", row.names = c(NA, -5L))
    

    【讨论】:

    • 这真的很优雅!如果我没记错的话,我们在 tidyverse 中没有与 split.default 等效的对象。
    • @AnoushiravanR 谢谢,group_split 中应该有类似的内容,即group_split.default 等。
    【解决方案2】:

    您也可以使用以下tidyverse解决方案:

    library(dplyr)
    library(tidyr)
    
    df1 %>%
      left_join(df1 %>%
                  pivot_longer(!Person_ID, names_to = "Food") %>%
                  left_join(df2, by = "Food") %>%
                  group_by(Person_ID, Group) %>%
                  summarise(sum = sum(value), .groups = "drop") %>%
                  pivot_wider(names_from = Group, values_from = sum), by = "Person_ID")
    
      Person_ID Apple Pear Chicken Steak Spinach Fruit Meat Vegetable
    1         1     1    0       0     5       1     1    5         1
    2         2     1    1       1     0       0     2    1         0
    3         3     0    0       0     3       2     0    3         2
    

    【讨论】:

    • 为什么不映射? :-)
    • 哈哈是的,我昨天尝试了pamp,但认为只会变得更加困难。
    【解决方案3】:

    您也可以在这里使用purrr::reduce

    library(tidyverse)
    
    reduce(unique(df2$Group), 
           .init = df1,
           ~ .x %>% 
             mutate(!!.y := rowSums(df1[names(df1) %in% df2$Food[df2$Group == .y]]))
           )
    
    #>   Person_ID Apple Pear Chicken Steak Spinach Fruit Meat Vegetable
    #> 1         1     1    0       0     5       1     1    5         1
    #> 2         2     1    1       1     0       0     2    1         0
    #> 3         3     0    0       0     3       2     0    3         2
    

    或者等价于baseR

    Reduce(function(x, y) {x[[y]] <- rowSums(x[names(df1) %in% df2$Food[df2$Group == y]])
      x}, init = df1, unique(df2$Group))
    
      Person_ID Apple Pear Chicken Steak Spinach Fruit Meat Vegetable
    1         1     1    0       0     5       1     1    5         1
    2         2     1    1       1     0       0     2    1         0
    3         3     0    0       0     3       2     0    3         2
    

    【讨论】:

    • 精彩绝伦!
    猜你喜欢
    • 1970-01-01
    • 2021-11-14
    • 1970-01-01
    • 2021-07-30
    • 1970-01-01
    • 2021-07-01
    • 2021-05-19
    • 2020-12-03
    • 1970-01-01
    相关资源
    最近更新 更多