【问题标题】:How to sort each column according to the observations in the same output?如何根据同一输出中的观察对每一列进行排序?
【发布时间】:2021-11-08 12:29:03
【问题描述】:

我有这个数据框:

data.frame( obs= c("A","B","C","D","E"),
            Var1 = c(3.7, 7.8, 8.9, 7.0, 3.4),
            Var2 = c(2.7, 8.0, 1.0, 1.0, 2.0),
            Var3 = c(9.1, 1.5, 2.7, 9.0, 5.0))

按我的需要订购,我是这样做的:

rbind(
  data.frame( obs= c("A","B","C","D","E"),
              Var1 = c(3.7, 7.8, 8.9, 7.0, 3.4),
              Var2 = rep("",5),
              Var3 = rep("",5)) %>%  
    arrange(-Var1),
  
  data.frame( obs= c("A","B","C","D","E"),
              Var1 = rep("",5),
              Var2 = c(2.7, 8.0, 1.0, 1.0, 2.0),
              Var3 = rep("",5)) %>%  
    arrange(-Var2),
  
  data.frame( obs = c("A","B","C","D","E"),
              Var1 = rep("",5),
              Var2 = rep("",5),
              Var3 = c(9.1, 1.5, 2.7, 9.0, 5.0)) %>%  
    arrange(-Var3)
)

输出:

对于多个观察和列,如何使这个过程更加高效和通用?

【问题讨论】:

    标签: r sorting


    【解决方案1】:

    获取长格式数据,arrange数据降序,创建行号列获取宽格式数据。

    使用 dplyrtidyr 库,您可以这样做 -

    library(dplyr)
    library(tidyr)
    
    df %>%
      pivot_longer(cols = -obs) %>%
      arrange(name, desc(value)) %>%
      mutate(row = row_number()) %>%
      pivot_wider(names_from = name, values_from = value) %>%
      select(-row)
    
    #  obs    Var1  Var2  Var3
    #  <chr> <dbl> <dbl> <dbl>
    # 1 C       8.9  NA    NA  
    # 2 B       7.8  NA    NA  
    # 3 D       7    NA    NA  
    # 4 A       3.7  NA    NA  
    # 5 E       3.4  NA    NA  
    # 6 B      NA     8    NA  
    # 7 A      NA     2.7  NA  
    # 8 E      NA     2    NA  
    # 9 C      NA     1    NA  
    #10 D      NA     1    NA  
    #11 A      NA    NA     9.1
    #12 D      NA    NA     9  
    #13 E      NA    NA     5  
    #14 C      NA    NA     2.7
    #15 B      NA    NA     1.5
    

    【讨论】:

    • 非常感谢。是否可以保持原始列顺序?例如,“var10”列将出现在结果的第一列中。
    • "var10" 应该是第 1 列还是第 10 列?尝试将arrange 行更改为arrange(order(gtools::mixedorder(name)), desc(value))
    【解决方案2】:

    使用Map 和仅基础包的快速解决方案。

    r <- do.call(rbind, 
                 Map(\(x, y, z) {x <- x[y, ]; x[-c(1, z)] <- NA; x}, 
                     list(d), lapply(d[-1], order, decreasing=T), seq(ncol(d))[-1]))
    r
    #   obs Var1 Var2 Var3
    # 3    C  8.9   NA   NA
    # 2    B  7.8   NA   NA
    # 4    D  7.0   NA   NA
    # 1    A  3.7   NA   NA
    # 5    E  3.4   NA   NA
    # 21   B   NA  8.0   NA
    # 11   A   NA  2.7   NA
    # 51   E   NA  2.0   NA
    # 31   C   NA  1.0   NA
    # 41   D   NA  1.0   NA
    # 12   A   NA   NA  9.1
    # 42   D   NA   NA  9.0
    # 52   E   NA   NA  5.0
    # 32   C   NA   NA  2.7
    # 22   B   NA   NA  1.5
    

    基准测试

    这应该会更有效率。

    # Unit: microseconds
    #  expr       min        lq       mean     median        uq      max neval cld
    #   Map   876.396   900.545   970.8851   981.9155  1014.284   1145.8   100  a 
    # dplyr 14744.610 14963.718 17997.4444 15439.0440 15752.575 241073.9   100   b
    

    数据:

    d <- structure(list(obs = c("A", "B", "C", "D", "E"), Var1 = c(3.7, 
    7.8, 8.9, 7, 3.4), Var2 = c(2.7, 8, 1, 1, 2), Var3 = c(9.1, 1.5, 
    2.7, 9, 5)), class = "data.frame", row.names = c(NA, -5L))
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2014-07-11
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-08-30
      • 2015-08-21
      相关资源
      最近更新 更多