【问题标题】:CumSum that counts values only once based on group基于组仅计算一次值的 CumSum
【发布时间】:2018-09-08 20:14:01
【问题描述】:

我目前正在尝试创建一个累积总和列,该列将基于 Game_ID 创建一个累积总和,但只计算一次与 Game_ID 相关的值。例如,玩家 A 在 Game_ID == 1 中进行了 20 次射击,在 Game_ID == 2 中进行了 13 次射击。对于累积总和,我希望 Shot_Count 值(基于 Game_ID)仅计算一次,尽管出现在 Shot_Count 中列多次。考虑以下数据集:

Name         Game_ID       Shot_Count        CumSum_Shots
Player A         1             20                20 
Player B         1             15                15 
Player A         1             20                20
Player A         2             13                33 ## (20 + 13)
Player A         2             13                33 ## (20 + 13)
Player B         2             35                50 ## (15 + 35)
Player A         3             30                63 ## (33 + 30)
Player B         3             20                70 ## (50 + 20)
Player A         3             30                63 ## (33 + 30)
Player A         4             12                75 ## (63 + 12)
Player A         4             12                75 ## (63 + 12)
Player B         4             10                80 ## (70 + 10)

请记住,还有其他变量导致第 1 行和第 3 行等不重复。我只是想将数据集简化为相关的变量。

我尝试在 data.table 库中使用 cumsum 函数:

library(data.table)
dt[ , CumSum_Shots := cumsum(Shot_Count), by = list(dt$Name, dt$Game_ID)]

但是,这会根据游戏对 Shot_Count 行求和(即第三行 CumSum_Shots 为 40)。这段代码这样做是有道理的,但我不确定存在什么 data.table 语法以使代码考虑 dt$Game_ID 的唯一值。

【问题讨论】:

  • 如果任何解决方案解决了您的问题,那么您应该accept it

标签: r data.table data-manipulation cumulative-sum


【解决方案1】:

唯一,计算,然后合并:

dt[unique(dt, by = c('Name', 'Game_ID', 'Shot_Count'))
       [, Cum_Shots := cumsum(Shot_Count), by = Name]
   , on = .(Name, Game_ID), Cum_Shots := Cum_Shots]

R 是一种肮脏的语言。

【讨论】:

    【解决方案2】:

    我假设你已经在使用data.table,那么你可以这样做:

    代码:

    library(data.table)
    merge(dt, 
          dt[, Shot_Count[1], .(Name, Game_ID)][, .(CumSum_Shots = cumsum(V1), Game_ID), Name], 
          sort = FALSE)
    

    输出:

            Name Game_ID Shot_Count CumSum_Shots
     1: Player A       1         20           20
     2: Player B       1         15           15
     3: Player A       1         20           20
     4: Player A       2         13           33
     5: Player A       2         13           33
     6: Player B       2         35           50
     7: Player A       3         30           63
     8: Player B       3         20           70
     9: Player A       3         30           63
    10: Player A       4         12           75
    11: Player A       4         12           75
    12: Player B       4         10           80
    

    解释:

    • dt[, Shot_Count[1], .(Name, Game_ID)]:由Group_IDName 拍摄第一张照片([1])。是否符合 OP 的要求(只计算一次)。
    • [, .(CumSum_Shots = cumsum(V1), Game_ID), Name]:计算每个 Name 的总和并保留 Group_ID 信息。
    • merge(dt, ..., sort = FALSE):与原始数据合并,保留原始排序。

    输入(dt):

    structure(list(Name = c("Player A", "Player B", "Player A", "Player A", 
    "Player A", "Player B", "Player A", "Player B", "Player A", "Player A", 
    "Player A", "Player B"), Game_ID = c(1L, 1L, 1L, 2L, 2L, 2L, 
    3L, 3L, 3L, 4L, 4L, 4L), Shot_Count = c(20L, 15L, 20L, 13L, 13L, 
    35L, 30L, 20L, 30L, 12L, 12L, 10L)), .Names = c("Name", "Game_ID", 
    "Shot_Count"), row.names = c(NA, -12L), class = c("data.table", 
    "data.frame"))
    

    编辑:

    当使用data.table 语法的长字符串时,我更喜欢magrittr 管道:

    library(magrittr)
    dt %>%
        .[, Shot_Count[1], .(Name, Game_ID)] %>%
        .[, .(CumSum_Shots = cumsum(V1), Game_ID), Name] %>%
        merge(dt, ., sort = FALSE)
    

    【讨论】:

      【解决方案3】:

      如果没有合并,您可以 cumsum 唯一值(通过 NameGameShots),然后 rep 它以获得正确的长度。

      dt[, CumSum_Shots2 := rep(cumsum(Shot_Count[!duplicated(Game_ID)]), times = .SD[,.N,by = .(Game_ID, Shot_Count)]$N) , 
         by = .(Name)]
      
      dt
       #      Name Game_ID Shot_Count CumSum_Shots CumSum_Shots2
       #1: PlayerA       1         20           20            20
       #2: PlayerB       1         15           15            15
       #3: PlayerA       1         20           20            20
       #4: PlayerA       2         13           33            33
       #5: PlayerA       2         13           33            33
       #6: PlayerB       2         35           50            50
       #7: PlayerA       3         30           63            63
       #8: PlayerB       3         20           70            70
       #9: PlayerA       3         30           63            63
      #10: PlayerA       4         12           75            75
      #11: PlayerA       4         12           75            75
      #12: PlayerB       4         10           80            80
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2018-01-13
        • 1970-01-01
        • 1970-01-01
        • 2013-05-26
        • 2015-06-21
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多