【问题标题】:How do I build a rank function that ranks ascending or descending based on defined column names?如何构建基于定义的列名升序或降序排名的排名函数?
【发布时间】:2019-01-31 17:35:59
【问题描述】:

我希望构建一个函数,该函数根据函数中定义的变量名称以升序或降序方式对变量进行排名。

我可以手动进行排名,但我希望能够调用该函数以简化我的df 的代码。正在寻找有人向我展示如何在广泛而漫长的df 上执行该功能。我的示例代码如下。我希望 tov 和分钟数按升序排列,其他列均按降序排列。如果有人可以向我展示如何执行该函数,以便我为升序和降序变量定义变量名称,以及另一个选项,我只定义要降序排列的变量,所有其他列默认为升职。

library(tidyverse)

df <- tibble::tribble(
                ~Name, ~Team, ~minutes, ~ftm, ~fta, ~oreb, ~dreb, ~treb, ~ast, ~stl, ~blk, ~tov, ~pts, ~eff,
  "Russell Westbrook", "OKC",     34.6,  8.8, 10.4,   1.7,     9,  10.7, 10.4,  1.6,  0.4,  5.4, 31.6, 33.8,
       "James Harden", "HOU",     36.4,  9.2, 10.9,   1.2,     7,   8.1, 11.2,  1.5,  0.5,  5.7, 29.1, 32.4,
      "Isaiah Thomas", "BOS",     33.8,  7.8,  8.5,   0.6,   2.1,   2.7,  5.9,  0.9,  0.2,  2.8, 28.9, 24.7,
      "Anthony Davis", "NOP",     36.1,  6.9,  8.6,   2.3,   9.5,  11.8,  2.1,  1.3,  2.2,  2.4,   28, 31.1,
      "DeMar DeRozan", "TOR",     35.4,  7.4,  8.7,   0.9,   4.3,   5.2,  3.9,  1.1,  0.2,  2.4, 27.3, 22.7,
     "Damian Lillard", "POR",     35.9,  6.5,  7.3,   0.6,   4.3,   4.9,  5.9,  0.9,  0.3,  2.6,   27, 24.5,
   "DeMarcus Cousins", "NOP",     34.2,  7.2,  9.3,   2.1,   8.9,    11,  4.6,  1.4,  1.3,  3.7,   27, 28.5,
       "LeBron James", "CLE",     37.8,  4.8,  7.2,   1.3,   7.3,   8.6,  8.7,  1.2,  0.6,  4.1, 26.4,   31,
      "Kawhi Leonard", "SAS",     33.4,  6.3,  7.2,   1.1,   4.7,   5.8,  3.5,  1.8,  0.7,  2.1, 25.5, 25.3,
      "Stephen Curry", "GSW",     33.4,  4.1,  4.6,   0.8,   3.7,   4.5,  6.6,  1.8,  0.2,    3, 25.3, 25.2
  )

df_wide <- df %>% 
  mutate_at(vars(ftm, ast), funs(rank = rank(desc(.)))) %>%
  mutate_at(vars(tov, minutes), funs(rank = rank((.))))

df_wide
#> # A tibble: 10 x 18
#>    Name  Team  minutes   ftm   fta  oreb  dreb  treb   ast   stl   blk
#>    <chr> <chr>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 Russ~ OKC      34.6   8.8  10.4   1.7   9    10.7  10.4   1.6   0.4
#>  2 Jame~ HOU      36.4   9.2  10.9   1.2   7     8.1  11.2   1.5   0.5
#>  3 Isai~ BOS      33.8   7.8   8.5   0.6   2.1   2.7   5.9   0.9   0.2
#>  4 Anth~ NOP      36.1   6.9   8.6   2.3   9.5  11.8   2.1   1.3   2.2
#>  5 DeMa~ TOR      35.4   7.4   8.7   0.9   4.3   5.2   3.9   1.1   0.2
#>  6 Dami~ POR      35.9   6.5   7.3   0.6   4.3   4.9   5.9   0.9   0.3
#>  7 DeMa~ NOP      34.2   7.2   9.3   2.1   8.9  11     4.6   1.4   1.3
#>  8 LeBr~ CLE      37.8   4.8   7.2   1.3   7.3   8.6   8.7   1.2   0.6
#>  9 Kawh~ SAS      33.4   6.3   7.2   1.1   4.7   5.8   3.5   1.8   0.7
#> 10 Step~ GSW      33.4   4.1   4.6   0.8   3.7   4.5   6.6   1.8   0.2
#> # ... with 7 more variables: tov <dbl>, pts <dbl>, eff <dbl>,
#> #   ftm_rank <dbl>, ast_rank <dbl>, tov_rank <dbl>, minutes_rank <dbl>

df_long <- df %>%
  gather(key = data_col, value = "stat_value", 3:14) %>% 
  group_by(data_col) %>% 
  mutate(rank = if_else(data_col %in% c("tov", "minutes"), rank(stat_value, ties.method = "first"), rank(-stat_value, ties.method = "first")))

df_long
#> # A tibble: 120 x 5
#> # Groups:   data_col [12]
#>    Name              Team  data_col stat_value  rank
#>    <chr>             <chr> <chr>         <dbl> <int>
#>  1 Russell Westbrook OKC   minutes        34.6     5
#>  2 James Harden      HOU   minutes        36.4     9
#>  3 Isaiah Thomas     BOS   minutes        33.8     3
#>  4 Anthony Davis     NOP   minutes        36.1     8
#>  5 DeMar DeRozan     TOR   minutes        35.4     6
#>  6 Damian Lillard    POR   minutes        35.9     7
#>  7 DeMarcus Cousins  NOP   minutes        34.2     4
#>  8 LeBron James      CLE   minutes        37.8    10
#>  9 Kawhi Leonard     SAS   minutes        33.4     1
#> 10 Stephen Curry     GSW   minutes        33.4     2
#> # ... with 110 more rows

我想要的输出与上面列出的df 相同。我正在寻找一个函数来清理手动 if_else 和上面的 2 行代码。假设该函数被称为stat_rank。我希望代码操作如下:

df_wide <- df %>% 
  mutate_at(vars(ftm, ast, tov, minutes), funs(rank = stat_rank(.)))) 


df_long <- df %>%
  gather(key = data_col, value = "stat_value", 3:14) %>% 
  group_by(data_col) %>% 
  mutate(rank = stat_rank(stat_value))

【问题讨论】:

  • 你的预期输出是什么am able to do the ranks manually, but I want to be able to call on the function in order to streamline the code for my df。您展示了两个代码 sn-ps。在代码中告诉我们问题
  • 我刚刚编辑了我上面的问题以显示示例结果,以及我设想的功能如何工作。

标签: r function dplyr


【解决方案1】:

如果我们需要一个函数,那么

stat_rank <- function(x) {
     col1 <- deparse(substitute(x))
     if(col1 %in% c('ftm', 'ast')) {
     rank(desc(x)) 
     } else rank(x)

}

df %>% 
   mutate_at(vars(ftm, ast, tov, minutes), funs(rank = stat_rank))
# A tibble: 10 x 18
#   Name         Team  minutes   ftm   fta  oreb  dreb  treb   ast   stl   blk   tov   pts   eff ftm_rank ast_rank tov_rank minutes_rank
#   <chr>        <chr>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>    <dbl>        <dbl>
# 1 Russell Wes… OKC      34.6   8.8  10.4   1.7   9    10.7  10.4   1.6   0.4   5.4  31.6  33.8        2      2        9            5  
# 2 James Harden HOU      36.4   9.2  10.9   1.2   7     8.1  11.2   1.5   0.5   5.7  29.1  32.4        1      1       10            9  
# 3 Isaiah Thom… BOS      33.8   7.8   8.5   0.6   2.1   2.7   5.9   0.9   0.2   2.8  28.9  24.7        3      5.5      5            3  
# 4 Anthony Dav… NOP      36.1   6.9   8.6   2.3   9.5  11.8   2.1   1.3   2.2   2.4  28    31.1        6     10        2.5          8  
# 5 DeMar DeRoz… TOR      35.4   7.4   8.7   0.9   4.3   5.2   3.9   1.1   0.2   2.4  27.3  22.7        4      8        2.5          6  
# 6 Damian Lill… POR      35.9   6.5   7.3   0.6   4.3   4.9   5.9   0.9   0.3   2.6  27    24.5        7      5.5      4            7  
# 7 DeMarcus Co… NOP      34.2   7.2   9.3   2.1   8.9  11     4.6   1.4   1.3   3.7  27    28.5        5      7        7            4  
# 8 LeBron James CLE      37.8   4.8   7.2   1.3   7.3   8.6   8.7   1.2   0.6   4.1  26.4  31          9      3        8           10  
# 9 Kawhi Leona… SAS      33.4   6.3   7.2   1.1   4.7   5.8   3.5   1.8   0.7   2.1  25.5  25.3        8      9        1            1.5
#10 Stephen Cur… GSW      33.4   4.1   4.6   0.8   3.7   4.5   6.6   1.8   0.2   3    25.3  25.2       10      4        6            1.5

请注意,在上述实现中,列名在函数中是硬编码的。如果需要更灵活,则可以将列名作为另一个参数传递

stat_rank <- function(x, descCols) {
     col1 <- deparse(substitute(x))
     if(col1 %in% descCols) {
     rank(desc(x)) 
     } else rank(x)

}

df %>% 
   mutate_at(vars(ftm, ast, tov, minutes), 
           funs(rank = stat_rank(., descCols = c('ftm', 'ast'))))

对于长格式数据,可以用一个函数

stat_rankL = function(x, y, descCols) {
     ifelse(x %in% descCols, rank(desc(y)), rank(y))

}  
df %>%
   gather(key = data_col, value = "stat_value", 3:14) %>% 
   group_by(data_col) %>% 
   mutate(rank = stat_rankL(data_col, stat_value, c('ftm', 'ast')))

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-12-31
    • 1970-01-01
    • 2015-07-30
    • 2022-11-02
    • 1970-01-01
    相关资源
    最近更新 更多