【问题标题】:Count unique values before a row itself for each group in data.table在 data.table 中的每个组的行本身之前计算唯一值
【发布时间】:2022-06-22 18:49:02
【问题描述】:

我有一个这样的data.table;

df <- data.table(Date = c(seq.Date(from = as.Date('2022-01-01'),to = as.Date('2022-01-07'),by=1),
                          seq.Date(from = as.Date('2022-01-01'),to = as.Date('2022-01-07'),by=1)),
           Product = c(rep('A',7),rep('B',7)),
           Owner = c(c('X','X','Y','Y','Z','Z','Z'),c('M','M','M','M','N','O','O')))

Product 在这里是我的组值,我想在当前行之前创建一个显示产品所有者的列。

我的意思是;

   Date       Product Owner BeforeOwnerCount
   <date>     <chr>   <chr>            <dbl>
 1 2022-01-01 A       X                    0
 2 2022-01-02 A       X                    0
 3 2022-01-03 A       Y                    1
 4 2022-01-04 A       Y                    1
 5 2022-01-05 A       Z                    2
 6 2022-01-06 A       Z                    2
 7 2022-01-07 A       Z                    2
 8 2022-01-01 B       M                    0
 9 2022-01-02 B       M                    0
10 2022-01-03 B       M                    0
11 2022-01-04 B       M                    0
12 2022-01-05 B       N                    1
13 2022-01-06 B       O                    2
14 2022-01-07 B       O                    2

dplyr 也欢迎使用动词。

提前致谢。

【问题讨论】:

    标签: r dplyr data.table


    【解决方案1】:

    假设日期列按时间顺序排列..(如果不是,则按日期键)

    df[, BOC := rleid(Owner) - 1, by = Product]
    
              Date Product Owner BOC
     1: 2022-01-01       A     X   0
     2: 2022-01-02       A     X   0
     3: 2022-01-03       A     Y   1
     4: 2022-01-04       A     Y   1
     5: 2022-01-05       A     Z   2
     6: 2022-01-06       A     Z   2
     7: 2022-01-07       A     Z   2
     8: 2022-01-01       B     M   0
     9: 2022-01-02       B     M   0
    10: 2022-01-03       B     M   0
    11: 2022-01-04       B     M   0
    12: 2022-01-05       B     N   1
    13: 2022-01-06       B     O   2
    14: 2022-01-07       B     O   2
    

    【讨论】:

    • 没想到语法这么简单,谢谢。
    【解决方案2】:

    dplyrfactor 一起使用:

    library(dplyr)
    library(data.table)
    setDF(df) %>%
      group_by(Product) %>%
      mutate(BeforeOwnerCount = as.numeric(as.factor(Owner))-1)
    

    输出:

    # A tibble: 14 × 4
    # Groups:   Product [2]
       Date       Product Owner BeforeOwnerCount
       <date>     <chr>   <chr>            <dbl>
     1 2022-01-01 A       X                    0
     2 2022-01-02 A       X                    0
     3 2022-01-03 A       Y                    1
     4 2022-01-04 A       Y                    1
     5 2022-01-05 A       Z                    2
     6 2022-01-06 A       Z                    2
     7 2022-01-07 A       Z                    2
     8 2022-01-01 B       M                    0
     9 2022-01-02 B       M                    0
    10 2022-01-03 B       M                    0
    11 2022-01-04 B       M                    0
    12 2022-01-05 B       N                    1
    13 2022-01-06 B       O                    2
    14 2022-01-07 B       O                    2
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2013-01-19
      • 2020-10-06
      • 2016-09-02
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多