【问题标题】:Nested condtionals in dplyr ifelse statementdplyr ifelse 语句中的嵌套条件
【发布时间】:2017-10-11 19:46:41
【问题描述】:

我正在使用dplyrifelse 创建一个基于以下数据的两个条件的新列。

dat <- structure(list(GenIndID = c("BHS_034", "BHS_034", "BHS_068", 
"BHS_068", "BHS_068", "BHS_068", "BHS_068", "BHS_068", "BHS_068", 
"BHS_068", "BHS_068"), IndID = c("BHS_034_A", "BHS_034_A", "BHS_068_A", 
"BHS_068_A", "BHS_068_A", "BHS_068_A", "BHS_068_A", "BHS_068_A", 
"BHS_068_A", "BHS_068_A", "BHS_068_A"), Fate = c("Mort", "Mort", 
"Alive", "Alive", "Alive", "Alive", "Alive", "Alive", "Alive", 
"Alive", "Alive"), Status = c("Alive", "Mort", "Alive", "Alive", 
"MIA", "Alive", "MIA", "Alive", "MIA", "Alive", "Alive"), Type = c("Linked", 
"Linked", "SOB", "SOB", "SOB", "SOB", "SOB", "SOB", "SOB", "SOB", 
"SOB"), SurveyID = c("GYA13-1", "GYA14-1", "GYA13-1", "GYA14-1", 
"GYA14-2", "GYA15-1", "GYA16-1", "GYA16-2", "GYA17-1", "GYA17-3", 
"GYA15-2"), SurveyDt = structure(c(1379570400, 1407477600, 1379570400, 
1407477600, 1409896800, NA, 1462946400, 1474351200, 1495519200, 
1507010400, 1441951200), tzone = "", class = c("POSIXct", "POSIXt"
))), row.names = c(NA, 11L), .Names = c("GenIndID", "IndID", 
"Fate", "Status", "Type", "SurveyID", "SurveyDt"), class = "data.frame")

> dat
   GenIndID     IndID  Fate Status   Type SurveyID   SurveyDt
1   BHS_034 BHS_034_A  Mort  Alive Linked  GYA13-1 2013-09-19
2   BHS_034 BHS_034_A  Mort   Mort Linked  GYA14-1 2014-08-08
3   BHS_068 BHS_068_A Alive  Alive    SOB  GYA13-1 2013-09-19
4   BHS_068 BHS_068_A Alive  Alive    SOB  GYA14-1 2014-08-08
5   BHS_068 BHS_068_A Alive    MIA    SOB  GYA14-2 2014-09-05
6   BHS_068 BHS_068_A Alive  Alive    SOB  GYA15-1       <NA>
7   BHS_068 BHS_068_A Alive    MIA    SOB  GYA16-1 2016-05-11
8   BHS_068 BHS_068_A Alive  Alive    SOB  GYA16-2 2016-09-20
9   BHS_068 BHS_068_A Alive    MIA    SOB  GYA17-1 2017-05-23
10  BHS_068 BHS_068_A Alive  Alive    SOB  GYA17-3 2017-10-03
11  BHS_068 BHS_068_A Alive  Alive    SOB  GYA15-2 2015-09-11

更具体地说,按GenIndID 分组我想根据TypeFate 的两个条件创建一个新的日期字段,即最大值SurveyDt。此外,我希望最大日期仅在 Status == Alive 时评估 SurveyDt。下面的代码生成所有NA 值,而不是满足所有指定条件的BHS_068 的描述日期字段。

我最近看到case_when 在这里可能是合适的,但我无法正确实现它。

dat %>% group_by(GenIndID) %>%
  mutate(NewDat = as.POSIXct(ifelse(Type == "SOB" & Fate == "Alive", max(SurveyDt[Status == "Alive"], na.rm = F), NA), 
                             origin='1970-01-01', na.rm=T)) %>%
  as.data.frame()

任何建议将不胜感激。

【问题讨论】:

  • 您能否提供一个表格来显示所需输出的外观?

标签: r if-statement dplyr


【解决方案1】:

如果您想坚持使用dplyr 并使用case_when,您必须确保每个case 语句的值都是相同的类型。

在这种情况下,您的 TRUE 值将是日期时间,因此您还必须将默认值设置为日期时间,方法是将其包装在 as.POSIXct 中。

dat %>%
  group_by(GenIndID) %>%
  mutate(NewDat = case_when(Type == "SOB" & Fate == "Alive" ~ max(SurveyDt[Status == "Alive"], na.rm = TRUE),
                            TRUE ~ as.POSIXct(NA, origin = "1970-01-01")))

使用ifelse

dat %>%
  group_by(GenIndID) %>%
  mutate(NewDat = ifelse(Type == "SOB" & Fate == "Alive", 
                         max(SurveyDt[Status == "Alive"], na.rm = TRUE), 
                         as.POSIXct(NA, origin = "1970-01-01")))

【讨论】:

  • 如果ifelse 可能获得相同的结果,我不会与case_when 绑定,因为我更熟悉该语法。
  • 对于case_whenTRUE ~ as.POSIXct(NA, origin = "1970-01-01" 提供else 部分ifelse?即对不满足指定条件的行进行填充。我无法从帮助文件中解释这一点(以我的 r 能力......)。
【解决方案2】:

我们可以使用data.table。转换为data.table(setDT(dat))后,指定i作为逻辑比较,按'GenIndID'分组,我们分配(:=),'SurveyDt'的max'Status'是“活着”到“NewDat”

library(data.table)
setDT(dat)[Type == "SOB" & Fate == "Alive",
         NewDat := max(SurveyDt[Status == "Alive"], na.rm = TRUE), GenIndID]

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2013-08-03
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-09-26
    • 1970-01-01
    相关资源
    最近更新 更多