【发布时间】:2020-10-23 08:49:15
【问题描述】:
我有一个大数据框,我想选择满足日期列条件的行。数据框与此类似:
library(tidyverse)
library(lubridate)
curdate <- seq(as.Date("2000/1/1"), by = "month", length.out = 24)
expdate <- rep(seq(as.Date("2000/3/1"), by = "quarter", length.out = 12),2)
afactor <- rep(c("C","P"),12)
anumber <- runif(24)
df<-data.frame(curdate, expdate, afactor, anumber)
df$expdate[12]<-as.Date("2001-02-01")
我想获取到期日期月份 (expdate) 比当前日期月份 (curdate) 晚两个月的行。在这个例子中,我应该选择这五个日期(第 1、7、12、13 和 19 行):
curdate expdate afactor anumber
2000-01-01 2000-03-01 C 0.6832251
2000-07-01 2001-09-01 C 0.2671076
2001-01-01 2000-03-01 C 0.2097065
2001-07-01 2001-09-01 C 0.9258450
2000-12-01 2001-02-01 P 0.4903951
首先我为此使用了以下行:
df_select1 <- df %>% group_by(curdate, afactor) %>%
filter(month(expdate) == month(curdate)+2)
但它会忽略 11 月或 12 月的情况。例如在这里,它错过了 curdate 为 2000-12-01 的情况。所以我想添加一个条件来处理这些情况。我写道:
df_select2 <- df %>% group_by(curdate, afactor) %>%
if_else(month(curdate)<11,
filter(month(expdate) == month(curdate)+2),
filter(month(expdate) == month(curdate)-10))
但我收到以下错误:condition 必须是逻辑向量,而不是 grouped_df/tbl_df/tbl/data.frame 对象。
我找到了以下解决方案,但肯定有更短的方法:
df_select1 <- df %>% group_by(curdate, afactor) %>%
filter(month(curdate)<11) %>%
filter(month(expdate) == month(curdate)+2)
df_select2 <- df %>% group_by(curdate, afactor) %>%
filter(month(curdate)>10) %>%
filter(month(expdate) == month(curdate)-10)
df_select <- full_join(df_select1, df_select2)
【问题讨论】:
标签: r dataframe if-statement filter pipes-filters