【问题标题】:Removing Pairs of Reversal Transactions in R删除 R 中的撤销交易对
【发布时间】:2021-05-19 11:51:47
【问题描述】:

我有以下 1 月至 2 月三个月的交易数据:

tab.m <- structure(list(Date = structure(c(1580947200, 1581033600, 1581120000,
1581206400, 1581292800, 1581379200, 1581465600, 1581552000, 1581638400,
1583798400, 1583884800, 1583971200, 1584057600, 1584144000, 1584230400,
1584316800, 1584403200, 1587168000, 1587254400, 1587340800, 1587427200,
1587513600, 1587600000, 1587686400), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), Month = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3,
3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4), `Product Type` = c("LIZX",
"LIZX", "LIZX", "LIZX", "LIZX", "LIZX", "LIZX", "LIZX", "LIZX",
"LIZX", "LIZX", "LIZX", "LIZX", "LIZX", "LIZX", "LIZX", "LIZX",
"LIZX", "LIZX", "LIZX", "LIZX", "LIZX", "LIZX", "LIZX"), Account = c(931,
931, 931, 931, 931, 931, 931, 931, 931, 931, 931, 931, 931, 931,
931, 931, 931, 931, 931, 931, 931, 931, 931, 931), Subsidiary = c(124,
124, 124, 124, 124, 124, 124, 124, 124, 124, 124, 124, 124, 124,
124, 124, 124, 124, 124, 124, 124, 124, 124, 124), Description = c("Transaction",
"Transaction X", "Transaction", "Transaction", "Transaction X",
"Transaction", "Transaction", "Transaction", "Transaction", "Transaction",
"Transaction", "Transaction", "Transaction", "Transaction", "Transaction",
"Transaction", "Transaction", "Transaction", "Transaction", "Transaction",
"Transaction", "Transaction", "Transaction", "Transaction"),
    `Policy Number` = c(42057926, 42057926, 42057926, 42057926,
    42057926, 42057926, 42057926, 42057926, 42057926, 42060466,
    42060466, 42060466, 42060466, 42060466, 42060466, 42060466,
    42060466, 42060467, 42060467, 42060467, 42060467, 42060467,
    42060467, 42060467), Amount = c(10, -10, 20, -20, 30, 24,
    23, 22, -0.56, 1, -1, 2, -2, 2, 3, 4, -1, 3, -3, -3, -3,
    -3, -3, -3)), row.names = c(NA, -24L), class = c("tbl_df",
"tbl", "data.frame"))

我使用 split() 函数按月份和政策编号对交易的数据框进行分组:

grouped = split(tab.m,list(tab.m$Month,tab.m$`Policy Number`))

在每个组中,有成对的冲销交易,其中有一个正/负金额,以及之前/之后的另一行,与该完全相同的金额相反。我想从每个组中删除这些交易对,然后将这些组合并回一个数据框。可能是正面交易先出现,然后是负面交易,反之亦然。

请注意,逆转交易对并不总是相邻的。

【问题讨论】:

  • 有一笔交易+3.00,然后多笔交易-3.00。您如何确定哪个是反转?
  • 多个-3.00事务中的哪一个被移除无关紧要,只要移除其中一对即可。
  • 对我的意思是一对-3和+3行

标签: r dataframe dplyr transactions data-cleaning


【解决方案1】:

这是dplyr 的一种方法:

library(dplyr)
tab.m %>% 
  group_by(Month,`Policy Number`) %>%
  mutate(id = rep(seq_along(rle(abs(Amount))$lengths),   #Create a temporary grouping id using run length encoding
                  times = rle(abs(Amount))$lengths)) %>% #such that sets of Amounts with the same absolute value are together
  group_by(Month,`Policy Number`,id) %>% #Group by this new temporary id
  mutate(temp = min(table(factor(sign(Amount),            #Create a new temporary value that calculates the number
                                 levels = c(-1,1))))) %>% #of positives and negatives, the minimum value of which can be removed
  group_by(Month,`Policy Number`, id, Amount) %>% #Group by id and Amount
  dplyr::filter(n() > temp) %>% #Filter out values less than the number to remove
  dplyr::select(-c(id,temp)) #Remove temporary columns
      id Date                Month `Product Type` Account Subsidiary Description   `Policy Number` Amount
   <int> <dttm>              <dbl> <chr>            <dbl>      <dbl> <chr>                   <dbl>  <dbl>
 1     3 2020-02-10 00:00:00     2 LIZX               931        124 Transaction X        42057926  30   
 2     4 2020-02-11 00:00:00     2 LIZX               931        124 Transaction          42057926  24   
 3     5 2020-02-12 00:00:00     2 LIZX               931        124 Transaction          42057926  23   
 4     6 2020-02-13 00:00:00     2 LIZX               931        124 Transaction          42057926  22   
 5     7 2020-02-14 00:00:00     2 LIZX               931        124 Transaction          42057926  -0.56
 6     2 2020-03-12 00:00:00     3 LIZX               931        124 Transaction          42060466   2   
 7     2 2020-03-14 00:00:00     3 LIZX               931        124 Transaction          42060466   2   
 8     3 2020-03-15 00:00:00     3 LIZX               931        124 Transaction          42060466   3   
 9     4 2020-03-16 00:00:00     3 LIZX               931        124 Transaction          42060466   4   
10     5 2020-03-17 00:00:00     3 LIZX               931        124 Transaction          42060466  -1   
11     1 2020-04-19 00:00:00     4 LIZX               931        124 Transaction          42060467  -3   
12     1 2020-04-20 00:00:00     4 LIZX               931        124 Transaction          42060467  -3   
13     1 2020-04-21 00:00:00     4 LIZX               931        124 Transaction          42060467  -3   
14     1 2020-04-22 00:00:00     4 LIZX               931        124 Transaction          42060467  -3   
15     1 2020-04-23 00:00:00     4 LIZX               931        124 Transaction          42060467  -3   
16     1 2020-04-24 00:00:00     4 LIZX               931        124 Transaction          42060467  -3   

【讨论】:

  • 您对我如何使用上述代码仅适用于大于 3 的数量值有什么建议吗?很棒的答案,谢谢!!
  • 我相信您之前的建议会在复制后过滤掉。我希望在检查重复之前过滤它们,然后应用删除重复并将其结果绑定到过滤出的数据。
  • 在第一个 group_by 之前添加 dplyr::filter(abs(Amount) &gt; 3) %&gt;%。这将删除金额绝对值小于 3 的所有行。
  • 虽然这对于上面的示例非常有效,但当一组中存在其他多对逆转时(即按月份和保单编号分组时)已经查看了您的代码,它似乎不起作用每组只删除一对?
猜你喜欢
  • 2018-11-29
  • 1970-01-01
  • 1970-01-01
  • 2019-10-07
  • 2016-06-25
  • 2014-03-13
  • 2018-02-11
  • 2018-05-21
  • 2010-09-19
相关资源
最近更新 更多