使用 mutate 和 min_rank 根据其他两列中的值创建排名列答案

【问题标题】：Creating a ranking column based on values in two other columns using mutate and min_rank使用 mutate 和 min_rank 根据其他两列中的值创建排名列
【发布时间】：2020-06-18 17:20:03
【问题描述】：

我正在尝试重新访问一些旧代码，其中我使用了一个 for 循环来计算基于两列的基因的组合排名。我的最终目标是得到一个列，列出数据集中任何给定基因表现优于的基因比例。

我有一个我称之为 score 的 data.frame，其中包含两列与我的基因相关的分数。为了计算组合排名，我使用以下 for 循环，并通过将结果排名除以观察总数来计算比例分数。

scores <- data.frame(x = c(0.128, 0.279, 0.501, 0.755, 0.613), y = c(1.49, 1.43, 0.744, 0.647, 0.380))

#Calculate ranking
comb.score = matrix(0, nrow = nrow(scores), ncol = 1)
for(i in 1:nrow(scores)){
  comb.score[i] = length(which(scores[ , 1] < scores[i, 1] & scores[ , 2] < scores[i, 2]))
}

comb.score <- comb.score/length(comb.score) #Calculate proportion

现在我已经对 tidyverse 变得更加熟悉和熟悉了，我想将此代码转换为使用 tidyverse 函数，但我自己无法弄清楚，也无法通过 SO 或 RStudio 社区的答案来解决。

我的想法是使用mutate() 和min_rank()，但我并不完全确定语法。此外，min_rank() 的行为似乎使用像 scores[ , 1] <= scores[i, 1] 这样的逻辑测试来评估排名，而不是像我在原始测试中所做的那样仅使用

我的预期结果是scores 表中的一个附加列，它与上述代码中的comb.score 输出具有相同的输出：一个分数告诉我整个数据集中某个基因所在的基因的比例给定行的性能优于。

任何帮助将不胜感激！如果我需要澄清任何事情或添加更多信息，请告诉我！

【问题讨论】：

您的 people 数据框是什么样的？您的预期输出是什么？
people data.frame 是一个错字。我更新了我的问题，使其更具体一些，并说明我的预期输出。

标签： r dplyr

【解决方案1】：

有趣的问题。我建议这样：

scores %>%
  rowwise() %>%
  mutate(comb_score = sum(x > .$x & y > .$y)) %>%
  ungroup() %>%
  mutate(comb_score = comb_score/n())

给了

# A tibble: 5 x 3
      x     y comb_score
  <dbl> <dbl>      <dbl>
1 0.128 1.49         0  
2 0.279 1.43         0  
3 0.501 0.744        0  
4 0.755 0.647        0.2
5 0.613 0.38         0

【讨论】：

你能解释一下你答案的.$x 和.$y 部分吗？
@Mcmahoon89 .$x 只是写scores$x 的另一种方式。行式x > .$x 创建一个逻辑向量，将整个“列”scores$x 与x 的当前值进行比较。它基本上是scores[i, 1] > scores[ , 1] 的矢量化版本。

【解决方案2】：

有点类似于 Martins 的回答，但改用 pmap。

library(tidyverse)

scores <- data.frame(
    x = c(0.128, 0.279, 0.501, 0.755, 0.613), 
    y = c(1.49, 1.43, 0.744, 0.647, 0.380)
)

scores %>% 
  mutate(
    score = pmap(list(x, y), ~ sum(..1 > x & ..2 > y)) / n()
  )
#>       x     y score
#> 1 0.128 1.490     0
#> 2 0.279 1.430     0
#> 3 0.501 0.744     0
#> 4 0.755 0.647   0.2
#> 5 0.613 0.380     0

^{由reprex package (v0.3.0) 于 2020 年 6 月 18 日创建}

【讨论】：