【问题标题】:R data.table dynamic column name of group by returning new tableR data.table 通过返回新表的组的动态列名
【发布时间】:2020-01-03 09:44:16
【问题描述】:

默认情况下,对 data.table 的 group by 操作会返回一个带有自动命名列 V1 的新 data.table:

dt <- data.table(a = sample(1:100, 100), b = sample(1:100, 100), id = rep(1:10,10))
dt[, mean(a), by = id]

#     id V1
# 1:  1 48.2
# 2:  2 47.9
# 3:  3 46.8
# 4:  4 54.7
# 5:  5 63.7
# 6:  6 50.6
# 7:  7 43.3
# 8:  8 52.7
# 9:  9 45.4
# 10: 10 51.7

按照this post我可以设置列的名称,结果如下

dt[, list(mean = mean(a)), by = id]

是否可以为列名设置一个变量?例如,我不想明确设置mean,而是想做类似的事情

column_name <- "mean"
dt[, list(column_name = mean(a)), by = id]  # resulting column name is column_name (and not mean)

【问题讨论】:

    标签: r data.table


    【解决方案1】:

    我们可以使用setNames

    library(data.table)
    dt[, setNames(list(mean(a)), column_name), by = id]
    
    #    id mean
    # 1:  1 56.8
    # 2:  2 50.5
    # 3:  3 50.5
    # 4:  4 42.4
    # 5:  5 49.9
    # 6:  6 47.8
    # 7:  7 60.6
    # 8:  8 57.4
    # 9:  9 54.6
    #10: 10 34.5
    

    数据

    set.seed(123)
    dt <- data.table(a = sample(1:100, 100), b = sample(1:100, 100), id = rep(1:10,10))
    column_name <- "mean"
    

    【讨论】:

    【解决方案2】:

    我们可以从data.table使用setnames

    library(data.table)
    setnames(dt[, .(mean(a)), by = id], 'V1', column_name)[]
    #    id mean
    # 1:  1 56.8
    # 2:  2 50.5
    # 3:  3 50.5
    # 4:  4 42.4
    # 5:  5 49.9
    # 6:  6 47.8
    # 7:  7 60.6
    # 8:  8 57.4
    # 9:  9 54.6
    #10: 10 34.5
    

    数据

    set.seed(123)
    dt <- data.table(a = sample(1:100, 100), b = sample(1:100, 100), id = rep(1:10,10))
    column_name <- "mean"
    

    【讨论】:

      【解决方案3】:

      为了完整起见,您还可以部署一个返回命名列表的循环。例如,使用Map():

      dt[
        , Map(
          function(i) {
            mean(a)
          }
          , i = "Mean"
        )
        , by = id
      ]
      

      或者对于 2+ 函数调用/列:

      dt[
        , Map(
          function(i, fun) {
            do.call(
              fun
              , list(a)
            )
          }
          , i = c("Mean", "SD")
          , fun = c(mean, sd)
        )
        , by = id
      ]
      #     id Mean       SD
      #  1:  1 56.8 29.23012
      #  2:  2 50.5 26.18842
      #  3:  3 50.5 24.82047
      #  4:  4 42.4 34.72495
      #  5:  5 49.9 26.99979
      #  6:  6 47.8 28.35411
      #  7:  7 60.6 31.52142
      #  8:  8 57.4 32.22904
      #  9:  9 54.6 27.90141
      # 10: 10 34.5 30.94529
      

      【讨论】:

        猜你喜欢
        • 2012-07-29
        • 2017-03-23
        • 1970-01-01
        • 2018-11-21
        • 1970-01-01
        • 2020-11-12
        • 1970-01-01
        • 2022-07-27
        • 2012-03-20
        相关资源
        最近更新 更多