【发布时间】:2020-12-17 02:01:08
【问题描述】:
我希望你能帮助我进行子集设置
数据 1
df2 <- structure(
list(
record_id = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2),
day_count = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20),
event = c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0)),
row.names = c(NA, -40L),
class = c("tbl_df", "tbl", "data.frame"))
我的目标我想对事件前后的 5 个观察日 (hazard = 1) 和事件前 12 天的 5 个观察日 (hazard = 2) 进行子集标记和标记创建hazard 变量。请参阅下面的输出:
预期输出 1
df2_output <- structure(
list(
record_id = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2),
day_count = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20),
event = c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0),
hazard = c(NA,NA,NA,2,2,2,2,2,NA,NA,NA,NA,NA,NA,1,1,1,1,1,NA,NA,NA,NA,2,2,2,2,2,NA,NA,NA,NA,NA,NA,1,1,1,1,1,NA)),
row.names = c(NA, -40L),
class = c("tbl_df", "tbl", "data.frame"))
尝试 所以我尝试了以下有效的代码
test_df2 <- df2 %>%
mutate(hazard = case_when(
(day_count <= df2$day_count[df2$event == 1]) & (day_count > ((df2$day_count[df2$event == 1]) -5)) ~ 1,
(day_count <= ((df2$day_count[df2$event == 1]) -11)) & (day_count > ((df2$day_count[df2$event == 1]) -16)) ~ 2
)) %>%
view()
问题但是,当我在我的主数据集中尝试这个类似的想法时,我收到了以下错误:
Error: Problem with `mutate()` input `hazard`.
x Input `hazard` can't be recycled to size 553.
ℹ Input `hazard` is `case_when(...)`.
ℹ Input `hazard` must be size 553 or 1, not 82.
ℹ The error occurred in group 30: record_id = 120001.
Run `rlang::last_error()` to see where the error occurred.
In addition: There were 50 or more warnings (use warnings() to see the first 50)
我已经确保删除无法充分回顾的事件
我也尝试只选择 1 个记录 ID 来测试代码,但收到以下错误:
4: In day_count > ((case_series_analysis$day_count[case_series_analysis$te_yn == :
longer object length is not a multiple of shorter object length
有人知道吗?
体重
【问题讨论】:
-
dplyr函数需要不带引号的列名,而不是data$column。如果您删除所有df2$并只保留列名,那至少会让您更接近工作。 -
除了@GregorThomas cmets,错误问题是您正在对数据
day_count <= df2$day_count[df2$event == 1])进行子集化。子集的长度可能会更少导致值的回收,并且它会具有不正确的值。比较是按元素进行的,因此如果 rhs 的长度不同。它试图回收这些值以使长度相同