【问题标题】:R data.table sorting by group with "other" at bottom of each groupR data.table 按组排序,每组底部有“其他”
【发布时间】:2021-12-03 20:46:31
【问题描述】:

我无法完全掌握正确的语法。我有一个data.table,我想首先按分组列g1(有序因子)排序,然后按另一列n 降序排列。唯一的问题是,我希望第三列 g2 的标记为“其他”的行出现在每个组的底部,而不管它们的值是 n

例子:

library(data.table)

dt <- data.table(g1 = factor(rep(c('Australia', 'Mexico', 'Canada'), 3), levels = c('Australia', 'Canada', 'Mexico')),
                 g2 = rep(c('stuff', 'things', 'other'), each = 3),
                 n = c(1000, 2000, 3000, 5000, 100, 3500, 10000, 10000, 0))

这是预期的输出,在每个g1 中,我们的降序排列为n,除了g2 == 'other' 始终位于底部的行:

         g1     g2     n
1: Australia things  5000
2: Australia  stuff  1000
3: Australia  other 10000
4:    Canada things  3500
5:    Canada  stuff  3000
6:    Canada  other     0
7:    Mexico  stuff  2000
8:    Mexico things   100
9:    Mexico  other 10000

【问题讨论】:

    标签: r data.table


    【解决方案1】:

    利用data.table::order 及其--reverse ordering:

    dt[order(g1, g2 == "other", -n), ]
    #           g1     g2     n
    #       <fctr> <char> <num>
    # 1: Australia things  5000
    # 2: Australia  stuff  1000
    # 3: Australia  other 10000
    # 4:    Canada things  3500
    # 5:    Canada  stuff  3000
    # 6:    Canada  other     0
    # 7:    Mexico  stuff  2000
    # 8:    Mexico things   100
    # 9:    Mexico  other 10000
    

    我们添加g2 == "other",因为您说“其他”应该始终排在最后。例如,如果 "stuff""abc",那么我们可以看到行为上的差异:

    dt[ g2 == "stuff", g2 := "abc" ]
    dt[order(g1, -n), ]
    #           g1     g2     n
    #       <fctr> <char> <num>
    # 1: Australia  other 10000
    # 2: Australia things  5000
    # 3: Australia    abc  1000
    # 4:    Canada things  3500
    # 5:    Canada    abc  3000
    # 6:    Canada  other     0
    # 7:    Mexico  other 10000
    # 8:    Mexico    abc  2000
    # 9:    Mexico things   100
    
    dt[order(g1, g2 == "other", -g2), ]
    #           g1     g2     n
    #       <fctr> <char> <num>
    # 1: Australia things  5000
    # 2: Australia    abc  1000
    # 3: Australia  other 10000
    # 4:    Canada things  3500
    # 5:    Canada    abc  3000
    # 6:    Canada  other     0
    # 7:    Mexico things   100
    # 8:    Mexico    abc  2000
    # 9:    Mexico  other 10000
    

    这样做的一个缺点是setorder 不能直接工作:

    setorder(dt, g1, g2 == "other", -n)
    # Error in setorderv(x, cols, order, na.last) : 
    #   some columns are not in the data.table: ==,other
    

    所以我们需要重新排序并重新分配回dt

    顺便说一句:这是因为g2 == "other" 解析为logical,是的,但是在排序时将它们视为0(假)和1(真),因此错误条件将出现在真条件之前.

    【讨论】:

    • 这很好,给了我我需要的解决方案,但仅供参考,我想要的结果实际上需要dt[order(g1, g2 == "other", -n), ]
    • 谢谢@qdread!
    • 非常感谢@r2evans 的清晰且非常有启发性的回答。因此我删除了我不正确的答案。
    猜你喜欢
    • 2016-03-28
    • 1970-01-01
    • 1970-01-01
    • 2019-02-26
    • 1970-01-01
    • 2016-09-03
    • 1970-01-01
    • 2015-05-13
    • 1970-01-01
    相关资源
    最近更新 更多