【问题标题】:Separating the results from cumsum on two different groups into two different columns?将两个不同组的 cumsum 的结果分成两个不同的列?
【发布时间】:2019-06-25 23:26:52
【问题描述】:

我试图找到两个不同组的累积总和,并将这些总和分别列在不同的列中。

这是数据框,按时间排序:

time  group  value
0     A      0
0     B      0
0     A      0
1     A      0
1     B      1
1     B      0
2     B      1
2     A      1
2     A      1
2     A     -1
3     A      0
3     B      1

这是我必须按组查找 cumsum 并创建 cumsum 列:

df$cumsum <- ave(df$value, df$group, FUN=cumsum)

time  group  value  cumsum
0     A      0      0
0     B      0      0
0     A      0      0
1     A      0      0
1     B      1      1
1     B      0      1
2     B      1      2
2     A      1      1
2     A      1      2
2     A     -1      1
3     A      0      1
3     B      1      3

如何将结果分成两列,一列用于 A,另一列用于 B?或者,是否有可能找到有条件的 cumsum?无论哪种方式,我都希望结果如下所示:

time  group  value  cumsum_A  cumsum_B
0      A      0     0         0
0      B      0     0         0
0      A      0     0         0
1      A      0     0         0
1      B      1     0         1
1      B      0     0         1
2      B      1     0         2
2      A      1     1         2
2      A      1     2         2
2      A     -1     1         2
3      A      0     1         2
3      B      1     1         3

谢谢!

【问题讨论】:

  • 这可能会有所帮助:stackoverflow.com/questions/27275363/…
  • 喜欢cumsum(replace(dat$value, dat$group == "A", 0)) ?
  • @StewartMacdonald - 我不认为这不会做同样的事情。这会在每个组中执行cumsum,而不是cumsum,它会忽略输出分组而只计算一个组。

标签: r cumsum


【解决方案1】:

您可以首先找出unique 的值,然后使用sapply/lapply 循环它们,以有条件地为每个值计算cumsum

unique_val <- unique(df$group)
df[paste0("cumsum_", unique_val)] <- lapply(unique_val, 
                     function(x) cumsum((df$group == x) * df$value))

df
#   time group value cumsum_A cumsum_B
#1     0     A     0        0        0
#2     0     B     0        0        0
#3     0     A     0        0        0
#4     1     A     0        0        0
#5     1     B     1        0        1
#6     1     B     0        0        1
#7     2     B     1        0        2
#8     2     A     1        1        2
#9     2     A     1        2        2
#10    2     A    -1        1        2
#11    3     A     0        1        2
#12    3     B     1        1        3

【讨论】:

    【解决方案2】:

    您也可以使用if_elsevalue 替换为0,当它不属于以下所需组时。 dplyr 在这里不是必需的(使用base::ifelse 并避免使用mutate

    library(tidyverse)
    df1 <- structure(list(time = c(0L, 0L, 0L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L), group = c("A", "B", "A", "A", "B", "B", "B", "A", "A", "A", "A", "B"), value = c(0L, 0L, 0L, 0L, 1L, 0L, 1L, 1L, 1L, -1L, 0L, 1L)), class = "data.frame", row.names = c(NA, -12L))
    df1 %>%
      mutate(
        cumsum_A = cumsum(if_else(group == "A", value, 0L)),
        cumsum_B = cumsum(if_else(group == "B", value, 0L))
      )
    #>    time group value cumsum_A cumsum_B
    #> 1     0     A     0        0        0
    #> 2     0     B     0        0        0
    #> 3     0     A     0        0        0
    #> 4     1     A     0        0        0
    #> 5     1     B     1        0        1
    #> 6     1     B     0        0        1
    #> 7     2     B     1        0        2
    #> 8     2     A     1        1        2
    #> 9     2     A     1        2        2
    #> 10    2     A    -1        1        2
    #> 11    3     A     0        1        2
    #> 12    3     B     1        1        3
    

    reprex package (v0.3.0) 于 2019 年 6 月 25 日创建

    【讨论】:

      【解决方案3】:

      这是tablecolCumsums 的选项

      library(matrixStats)
      nm1 <- paste0("cumsum_", unique(df1$group))
      df1[nm1] <- colCumsums(table(seq_len(nrow(df1)),df1$group) * df1$value)
      df1
      #   time group value cumsum_A cumsum_B
      #1     0     A     0        0        0
      #2     0     B     0        0        0
      #3     0     A     0        0        0
      #4     1     A     0        0        0
      #5     1     B     1        0        1
      #6     1     B     0        0        1
      #7     2     B     1        0        2
      #8     2     A     1        1        2
      #9     2     A     1        2        2
      #10    2     A    -1        1        2
      #11    3     A     0        1        2
      #12    3     B     1        1        3
      

      或者另一个选项是model.matrix

      colCumsums((model.matrix(~  group -1, df1)) * df1$value)
      

      或者model.matrixtidyverse

      library(tidyverse)
      df1 %>%
          model.matrix( ~group - 1, .) %>%
          as_tibble %>% 
          mutate_all(~ cumsum(. * df1$value)) %>% 
          rename_all(~ str_replace(., "group", "cumsum")) %>%
          bind_cols(df1, .)
      #    time group value cumsumA cumsumB
      #1     0     A     0       0       0
      #2     0     B     0       0       0
      #3     0     A     0       0       0
      #4     1     A     0       0       0
      #5     1     B     1       0       1
      #6     1     B     0       0       1
      #7     2     B     1       0       2
      #8     2     A     1       1       2
      #9     2     A     1       2       2
      #10    2     A    -1       1       2
      #11    3     A     0       1       2
      #12    3     B     1       1       3
      

      或者使用countspread

      df1 %>%
            mutate(rn = row_number()) %>%
            dplyr::count(group, rn) %>% 
            mutate(group = str_c("cumsum", group)) %>%
            spread(group, n, fill = 0) %>% 
            mutate_at(-1, ~ cumsum(. * df1$value)) %>% 
            select(-rn) %>%
            bind_cols(df1, .)
      

      数据

      df1 <- structure(list(time = c(0L, 0L, 0L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
      3L, 3L), group = c("A", "B", "A", "A", "B", "B", "B", "A", "A", 
      "A", "A", "B"), value = c(0L, 0L, 0L, 0L, 1L, 0L, 1L, 1L, 1L, 
      -1L, 0L, 1L)), class = "data.frame", row.names = c(NA, -12L))
      

      【讨论】:

        猜你喜欢
        • 2018-06-12
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2019-02-12
        • 1970-01-01
        • 2010-09-26
        • 2013-08-27
        相关资源
        最近更新 更多