【发布时间】:2022-01-10 14:51:23
【问题描述】:
我需要对数据帧 d 进行子集化,并且我喜欢为每个 ID 号保留一行。但保留的行应在 AD 或 BD 中包含 I50,并且只保留日期最早的行。
所以最后我们将有两行 (ID:1&2) 和 AD/BD 中的 I50 和最早可能日期的数据框,因此日期将是 2007-12-12 和 2009-12-12。
我确实尝试了很多但找不到解决方案。
ID <- c(1,1,1,1,1,2,2,2,2,2)
AD <- c("DJ400", "DJ300", "DI501", "DI509", "DR409",
"DI509", "DJ200", "DA300", "DI500", "DR209")
Date <- as.Date(c("2010-12-12", "2011-12-12", "2007-12-12", "2008-12-12", "2009-12-12",
"2011-12-12", "2012-12-12", "2008-12-12", "2009-12-12", "2010-12-12"))
BD <- c("DI509", "DI500", "DI401", "DI409", "DR609",
"DI309", "DJ200", "DA300", "DI500", "DI509")
d <- data.frame(ID, AD, Date, BD)
hf <- subset(d, AD %in% "I50" | BD %in% "I50")
由reprex package (v2.0.0) 于 2022-01-10 创建
在第一个解决方案之后,我遇到了一些问题,我做了一些小改动,这里是新的代表。 我只需要每个 ID 一行。问题是有几个日期相同,而我之前没有包括在内。
ID <- c(1,1,1,1,1,2,2,2,2,2)
AD <- c("DJ400", "DJ300", "DI501", "DI509", "DR409",
"DI509", "DJ200", "DA300", "DI500", "DR209")
Date <- as.Date(c("2010-12-12", "2011-12-12", "2010-12-12", "20012-12-12", "2009-12-12",
"2011-12-12", "2012-12-12", "2012-12-12", "2009-12-12", "2010-12-12"))
BD <- c("DI509", "DI500", "DI401", "DI409", "DR609",
"DI309", "DJ200", "DA300", "DI500", "DI509")
d <- data.frame(ID, AD, Date, BD)
library(dplyr)
d %>%
group_by(ID) %>%
filter(if_any(c(AD, BD), ~ grepl("I50", .))) %>%
slice_min(Date) %>%
ungroup()
#> # A tibble: 3 x 4
#> ID AD Date BD
#> <dbl> <chr> <date> <chr>
#> 1 1 DJ400 2010-12-12 DI509
#> 2 1 DI501 2010-12-12 DI401
#> 3 2 DI500 2009-12-12 DI500
由reprex package (v2.0.1) 于 2022-01-11 创建
【问题讨论】:
标签: r date subset conditional-formatting