【问题标题】:summarizing column values by groups created based on conditionals按基于条件创建的组汇总列值
【发布时间】:2019-04-30 04:57:55
【问题描述】:

我有以下数据集:

Adv_Code    Change_Dt   Change_Month    April_OPN   May_OPN June_OPN    July_OPN    August_OPN  September_OPN   October_OPN November_OPN    December_OPN    January_OPN February_OPN    March_OPN
A201        12/04/2018  April           0           0       1           0           0           0               0           0                   0           0               0               0
A198        27/07/2018  August          2           0       0           1           2           0               5           0                   0           0               0               0
S1212       10/11/2018  November        0           3       4           0           0           3               0           1                   0           0               0               0

我需要根据change_month和change_dt将每月的交易分成N和V。 当日期在当月 15 日之后,change_month 落在下个月,否则与 change_dt 相同的月份。 例如,对于 A198,Change_Month 是 Aug,因此 April_OPN 到 July_OPN 将被分组在 N 类别下并保留在 V 类别下。 对于 S1212 ,由于日期在 15 日之前,因此 4 月 - 10 月 OPN 属于 N 并保持在 V 之下。

预期输出:

Adv_Code    Change_Dt   Change_Month    N_OPN   V_OPN
A201        12/04/2018  April           0       1   
A198        27/07/2018  August          3       7
S1212       10/11/2018  November        10      1   

有人可以帮我解决这个问题吗?

下面是重现数据集的代码:

Adv_Code <- c('A201','A198','S1212')
Change_Dt <- c(as.Date('12/04/2018'),as.Date('27/07/2018'),as.Date('10/11/2018'))
April_NOP <- c(0,2,0)
May_NOP <- c(0,0,3)
June_NOP <- c(0,0,4)
July_NOP <- c(0,1,0)
August_NOP <- c(0,2,0)
September_NOP <- c(0,0,3)
October_NOP <- c(0,5,0)
November_NOP <- c(0,0,1)
December_NOP    <- c(0,0,0)
January_NOP <- c(0,0,0)
February_NOP <- c(0,0,0)
March_NOP <- c(0,0,0)

df <- data.frame(Adv_Code,Change_Dt,April_NOP,May_NOP,June_NOP,July_NOP,August_NOP,September_NOP,October_NOP,November_NOP,December_NOP,January_NOP,February_NOP,March_NOP)

【问题讨论】:

  • 你能发布到目前为止你尝试过的东西吗?
  • 我什至不知道从哪里开始!

标签: r date reshape dplyr


【解决方案1】:

我们可以使用applyMARGIN = 1(逐行)。存储该行出现Change_Month 的列号(inds)。获取Change_Dt 的子字符串并检查该值是否大于或等于15,并基于sum 将值分成两部分并添加为新列。

col <- 4 #Column number from where the months start

df[c("N_OPN", "V_OPN")] <- t(apply(df, 1, function(x) {
       inds <- grep(x[["Change_Month"]], names(x))
       if (as.numeric(substr(x["Change_Dt"], 1, 2)) > 15)
          c(sum(as.numeric(x[col:pmax(col, inds - 1)])), 
            sum(as.numeric(x[inds:ncol(df)])))
        else
          c(sum(as.numeric(x[col:inds])), 
            sum(as.numeric(x[pmin(ncol(df), inds + 1):ncol(df)])))
}))


df[c(1:3, 16, 17)]
#  Adv_Code  Change_Dt Change_Month N_OPN V_OPN
#1     A201 12/04/2018        April     0     1
#2     A198 27/07/2018       August     3     7
#3    S1212 10/11/2018     November    11     0

数据

df <- structure(list(Adv_Code = structure(c(2L, 1L, 3L), .Label = 
c("A198", 
"A201", "S1212"), class = "factor"), Change_Dt = structure(c(2L, 
3L, 1L), .Label = c("10/11/2018", "12/04/2018", "27/07/2018"), class = 
"factor"), 
Change_Month = structure(1:3, .Label = c("April", "August", 
"November"), class = "factor"), April_OPN = c(0L, 2L, 0L), 
May_OPN = c(0L, 0L, 3L), June_OPN = c(1L, 0L, 4L), July_OPN = c(0L, 
1L, 0L), August_OPN = c(0L, 2L, 0L), September_OPN = c(0L, 
0L, 3L), October_OPN = c(0L, 5L, 0L), November_OPN = c(0L, 
0L, 1L), December_OPN = c(0L, 0L, 0L), January_OPN = c(0L, 
0L, 0L), February_OPN = c(0L, 0L, 0L), March_OPN = c(0L, 
0L, 0L)), class = "data.frame", row.names = c(NA, -3L))

【讨论】:

  • 你好 Ronak,逻辑似乎不适用于以下组合 Change_Month April 7/4/2017 April 7/4/2017 May 16/04/2017 January 4/1/2018
猜你喜欢
  • 2023-03-14
  • 1970-01-01
  • 2017-05-25
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多