【问题标题】:How can I use `mutate_at` for list-columns in a tibble?如何将 `mutate_at` 用于 tibble 中的列表列?
【发布时间】:2020-03-18 00:15:26
【问题描述】:

我有一个结构如下的小标题:

df <- 
  tibble(
    x = 1:3, 
    light_93 = list(1:3, 5:7, 18:20),
    light_94 = list(3:5, 9:11, 18:20),
    light_95 = list(5:7, 44:46, 30:32))

我想创建多个新列,给出每个 light_ 列列表的平均值。所以我想要这个结果:

out <- 
  df %>% 
  mutate(light_93_mean = map_dbl(light_93, mean),
         light_94_mean = map_dbl(light_94, mean),
         light_95_mean = map_dbl(light_95, mean))

我可以使用mutate_at 自动执行此操作吗? (我有数百个列表列。)我不知道如何让它在一个小标题中工作。

【问题讨论】:

    标签: r dplyr tidyverse tibble


    【解决方案1】:

    mutate_atvars 参数中指定要应用的列,然后在每个列中使用map 循环list 并获取mean

    library(dplyr)
    library(purrr)
    df %>%
         mutate_at(vars(starts_with('light')), 
            list(mean = ~ map_dbl(., mean)))
    # A tibble: 3 x 7
    #      x light_93  light_94  light_95  light_93_mean light_94_mean light_95_mean
    #  <int> <list>    <list>    <list>            <dbl>         <dbl>         <dbl>
    #1     1 <int [3]> <int [3]> <int [3]>             2             4             6
    #2     2 <int [3]> <int [3]> <int [3]>             6            10            45
    #3     3 <int [3]> <int [3]> <int [3]>            19            19            31
    

    或者使用带有acrossmutatedevel版本

    df %>% 
         mutate(across(starts_with('light'), ~ map_dbl(., mean), names = "{col}_mean"))
    # A tibble: 3 x 7
    #      x light_93  light_94  light_95  light_93_mean light_94_mean light_95_mean
    #  <int> <list>    <list>    <list>            <dbl>         <dbl>         <dbl>
    #1     1 <int [3]> <int [3]> <int [3]>             2             4             6
    #2     2 <int [3]> <int [3]> <int [3]>             6            10            45
    #3     3 <int [3]> <int [3]> <int [3]>            19            19            31
    

    也可以应用不同功能的不同列集

    df %>% 
        mutate(across(starts_with('light'), ~ map_dbl(., mean), names = "{col}_mean"),
               across(matches('(94|95)$'), ~ map_dbl(., sum), names = "{col}_sum"))
    # A tibble: 3 x 9
    #      x light_93  light_94  light_95  light_93_mean light_94_mean light_95_mean light_94_sum light_95_sum
    #  <int> <list>    <list>    <list>            <dbl>         <dbl>         <dbl>        <dbl>        <dbl>
    #1     1 <int [3]> <int [3]> <int [3]>             2             4             6           12           18
    #2     2 <int [3]> <int [3]> <int [3]>             6            10            45           30          135
    #3     3 <int [3]> <int [3]> <int [3]>            19            19            31           57           93
    

    【讨论】:

      【解决方案2】:

      在base R中,我们可以使用grep选择以"light"开头的列,并计算每个列表的mean并添加为新列。

      cols <- grep('^light', names(df), value = TRUE)
      df[paste0(cols, "_mean")] <- lapply(df[cols], function(x) sapply(x, mean))
      df
      
      # A tibble: 3 x 7
      #      x light_93  light_94  light_95  light_93_mean light_94_mean light_95_mean
      #  <int> <list>    <list>    <list>            <dbl>         <dbl>         <dbl>
      #1     1 <int [3]> <int [3]> <int [3]>             2             4             6
      #2     2 <int [3]> <int [3]> <int [3]>             6            10            45
       #3    3 <int [3]> <int [3]> <int [3]>            19            19            31
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2020-12-16
        • 2021-10-14
        • 2021-07-06
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2020-08-13
        • 2022-01-21
        相关资源
        最近更新 更多