逐列排序数据，在组内添加索引答案

【问题标题】：Sorting data frame by column, adding index within group逐列排序数据，在组内添加索引
【发布时间】：2016-10-21 00:29:36
【问题描述】：

This question 很好地描述了我的问题的设置。

然而，我有一个名为algorithm 的因子，而不是第二个值。我的数据框如下所示（注意即使在它们的组内也可能存在多个值）：

algorithm <- c("global", "distributed", "distributed", "none", "global", "global", "distributed", "none", "none")
v <- c(5, 2, 6, 7, 3, 1, 10, 2, 2)
df <- data.frame(algorithm, v)
df
    algorithm  v
1      global  5
2 distributed  2
3 distributed  6
4        none  7
5      global  3
6      global  1
7 distributed 10
8        none  2
9        none  2

我想按v 对数据框进行排序，但要获取每个条目相对于其组的排序位置（算法）。然后应该将此位置添加到原始数据框中（因此我不需要重新排列它），因为我想使用 ggplot 将计算出的位置绘制为 x 并将值绘制为 y （按算法分组，例如每个算法都是一组点）。

所以结果应该是这样的：

    algorithm  v  groupIndex
1      global  5  3
2 distributed  2  1
3 distributed  6  2
4        none  7  3
5      global  3  2
6      global  1  1
7 distributed 10  3
8        none  2  1
9        none  2  2

到目前为止，我知道我可以先按算法排序数据，然后按值排序，反之亦然。我想在第二步中我必须计算每个组内的索引？有没有简单的方法可以做到这一点？

df[order(df$algorithm, df$v), ]
    algorithm  v
2 distributed  2
3 distributed  6
7 distributed 10
6      global  1
5      global  3
1      global  5
8        none  2
9        none  2
4        none  7

编辑：不保证每个组的条目数量相同！

【问题讨论】：

标签： r sorting dataframe

【解决方案1】：

order 在每个组中的双重应用程序应涵盖它：

ave(df$v, df$algorithm, FUN=function(x) order(order(x)) )
#[1] 3 1 2 3 2 1 3 1 2

也相当于：

ave(df$v, df$algorithm, FUN=function(x) rank(x,ties.method="first") )
#[1] 3 1 2 3 2 1 3 1 2

，这反过来意味着如果您担心速度，您可以从data.table 中利用frank：

setDT(df)[, grpidx := frank(v,ties.method="first"), by=algorithm]
df
#     algorithm  v grpidx
#1:      global  5      3
#2: distributed  2      1
#3: distributed  6      2
#4:        none  7      3
#5:      global  3      2
#6:      global  1      1
#7: distributed 10      3
#8:        none  2      1
#9:        none  2      2

【讨论】：

【解决方案2】：

一种方法如下。我认为，您可以使用with_order() 为每个组订购v 值。您可以在函数中使用row_number() 分配等级。这样，您可以跳过一个步骤来安排每个组的数据，就像您尝试使用 order() 一样。

library(dplyr)
group_by(df, algorithm) %>%
mutate(groupInd = with_order(order_by = v, fun = row_number, x = v))

#    algorithm     v groupInd
#       <fctr> <int>    <int>
#1      global     5        3
#2 distributed     2        1
#3 distributed     6        2
#4        none     7        3
#5      global     3        2
#6      global     1        1
#7 distributed    10        3
#8        none     2        1
#9        none     2        2

【讨论】：