【问题标题】:How to assign each instance of a factor a specific value?如何为因子的每个实例分配特定值?
【发布时间】:2020-02-10 22:53:18
【问题描述】:

假设我有一个如下所示的数据框:

 playerID    yearID salary
1 abbotje01   1998 175000
2 abbotje01   1999 255000
3 abbotje01   2000 255000
4 abbotje01   2001 300000
5 abbotku01   1993 109000
6 abbotku01   1994 109000
.
.
.

如何获得一个数据框,为每个唯一的 playerID 分配最近一年的薪水,如下所示:

 playerID    yearID salary
1 abbotje01   1998 300000
2 abbotje01   1999 300000
3 abbotje01   2000 300000
4 abbotje01   2001 300000
5 abbotku01   1993 109000
6 abbotku01   1994 109000

我想保留 playerID 的每个实例,但只需为每个实例重新分配相同的薪水

【问题讨论】:

    标签: r


    【解决方案1】:

    按'playerID'分组后,获取'yearID'的max值的索引,提取对应的'salary',并用mutate更新'salary'列

    library(dplyr)
    df1 %>%
         group_by(playerID) %>%
          mutate(salary = salary[which.max(yearID)])
    # A tibble: 6 x 3
    # Groups:   playerID [2]
    #  playerID  yearID salary
    #  <chr>      <int>  <int>
    #1 abbotje01   1998 300000
    #2 abbotje01   1999 300000
    #3 abbotje01   2000 300000
    #4 abbotje01   2001 300000
    #5 abbotku01   1993 109000
    #6 abbotku01   1994 109000
    

    或者使用data.table

    library(data.table)
    setDT(df1)[, salary := salary[which.max(yearID)], playerID]
    

    数据

    df1 <- structure(list(playerID = c("abbotje01", "abbotje01", "abbotje01", 
    "abbotje01", "abbotku01", "abbotku01"), yearID = c(1998L, 1999L, 
    2000L, 2001L, 1993L, 1994L), salary = c(175000L, 255000L, 255000L, 
    300000L, 109000L, 109000L)), class = "data.frame", row.names = c("1", 
    "2", "3", "4", "5", "6"))
    

    【讨论】:

      【解决方案2】:

      我们可以order基于yearID的数据框,然后从每个组中提取最后一个salary

      这可以在基础 R 中完成

      df <- df[with(df, order(playerID, yearID)), ]
      df$final_salary <- with(df, ave(salary, playerID, FUN = function(x) x[length(x)]))
      #Also
      #df$final_salary <- with(df, ave(salary, playerID, FUN = function(x) tail(x, 1)))
      
      df
      
      #   playerID yearID salary final_salary
      #1 abbotje01   1998 175000       300000
      #2 abbotje01   1999 255000       300000
      #3 abbotje01   2000 255000       300000
      #4 abbotje01   2001 300000       300000
      #5 abbotku01   1993 109000       109000
      #6 abbotku01   1994 109000       109000
      

      dplyr

      library(dplyr)
      df %>%
        arrange(playerID, yearID) %>%
        group_by(playerID) %>%
        mutate(final_salary = last(salary))
      

      data.table

      library(data.table)
      
      setDT(df)
      df[order(yearID), final_salary := last(salary), playerID]
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2017-11-29
        • 1970-01-01
        • 1970-01-01
        • 2019-12-19
        • 2019-05-07
        相关资源
        最近更新 更多