【发布时间】:2017-11-09 17:38:00
【问题描述】:
我正在使用一个数据框,该数据框包含住院期间患者的位置。它的格式设置为,每一行代表该人 (=ID) 在特定时间段(从 BeginTim 到 EndTime)内的一个新位置(=部门、房间、床位)。
这是初始数据框的示例:
print(data_perlocation[1:10,])
ID department BeginTime EndTime room Bed
<dbl> <chr> <chr> <chr> <dbl> <dbl>
1 2156864 L14B 2016-03-02 09:40:00.0000000 2016-03-02 15:20:00.0000000 102 3
2 2161034 B51A 2016-06-07 00:00:00.0000000 2016-06-07 11:02:00.0000000 109 2
3 2161034 B51A 2016-06-06 09:00:00.0000000 2016-06-06 10:27:00.0000000 109 2
4 2161034 B51A 2016-06-06 12:47:00.0000000 2016-06-07 00:00:00.0000000 109 2
5 2161034 B51A 2016-06-06 10:27:00.0000000 2016-06-06 12:47:00.0000000 103 3
6 2176442 L14B 2016-02-04 07:15:00.0000000 2016-02-04 13:47:00.0000000 101 4
7 2176754 B61A 2016-03-15 07:16:00.0000000 2016-03-15 14:56:00.0000000 109 3
8 2176754 B61A 2016-03-16 08:10:00.0000000 2016-03-17 00:00:00.0000000 109 3
9 2176754 B61A 2016-03-15 14:56:00.0000000 2016-03-16 08:10:00.0000000 109 2
10 2176754 B61A 2016-03-17 00:00:00.0000000 2016-03-17 11:18:00.0000000 109 3
11 2184060 B61A 2016-03-10 20:25:00.0000000 2016-03-11 00:00:00.0000000 105 2
12 2184060 B61A 2016-03-10 20:01:00.0000000 2016-03-10 20:25:00.0000000 105 1
13 2184060 B61A 2016-03-11 00:00:00.0000000 2016-03-12 00:00:00.0000000 105 2
14 2184060 B61A 2016-03-12 00:00:00.0000000 2016-03-12 14:00:00.0000000 105 2
我想转换此数据框,以便我每天都有一行。因此,我创建了一个新的数据框,其中包含包含 ID 和入院日期的列。像这样:
ID Date
1 2156864 2016-03-02
2 2161034 2016-06-06
3 2161034 2016-06-07
4 2176442 2016-02-04
5 2176754 2016-03-15
6 2176754 2016-03-16
7 2176754 2016-03-17
8 2184060 2016-03-10
9 2184060 2016-03-11
10 2184060 2016-03-12
现在我想将data_bylocation 数据框中每天存在的(多个)位置添加到data_byday 行中,并匹配ID 并且beginDate 匹配日期。
我最终组合了一个 for 和两个 if 语句。到目前为止,我的尝试并没有给出任何接近预期结果的结果,我认为他们必须是一种更简单的方法来做到这一点。我最后一次尝试的结果是这样的:
data_perday[,3] <- NA
for (index in 1:nrow(data_perlocation)){
if (data_perlocation$ID[index]==data_perday$ID & as.Date(as.character(data_perlocation$BeginTime[index]), format="%Y-%m-%d")==as.Date(data_perday$Date, format="%Y-%m-%d")) {
if (is.na(data_perday[index,3])){
##code to assign location and time of for that day
} else {
##code to assign second location and time of for that day and place
}}}
期望的结果如下所示:
ID Date BeginTime1 EndTime1 department1 room1 bed1 BeginTime2 EndTime2 department2 room2 bed2 [3rd location, etc]
1 2156864 2016-03-02 [first location of this day] [second location of this day]
2 2161034 2016-06-06
3 2161034 2016-06-07
4 2176442 2016-02-04
5 2176754 2016-03-15
6 2176754 2016-03-16
7 2176754 2016-03-17
8 2184060 2016-03-10
9 2184060 2016-03-11
10 2184060 2016-03-12
我对 R 很陌生,还在学习。我已经被这个问题困扰了一段时间。因此,非常感谢任何正确方向的提示!
编辑:
可重现的例子:
data_byday <- structure(list(ID = c(2156864, 2161034, 2161034, 2176442, 2176754, 2176754, 2176754, 2184060, 2184060, 2184060), Date = c("2016-03-02", "2016-06-06", "2016-06-07", "2016-02-04", "2016-03-15", "2016-03-16", "2016-03-17", "2016-03-10", "2016-03-11", "2016-03-12")), .Names = c("ID", "Date"), row.names = c(NA, 10L), class = "data.frame")
data_bylocation <- structure(list(ID = c(2156864, 2161034, 2161034, 2161034, 2161034, 2176442, 2176754, 2176754, 2176754, 2176754, 2184060, 2184060, 2184060, 2184060), department = c("L14B", "B51A", "B51A", "B51A", "B51A", "L14B", "B61A", "B61A", "B61A", "B61A", "B61A", "B61A", "B61A", "B61A"), BeginTime = c("2016-03-02 09:40:00.0000000", "2016-06-07 00:00:00.0000000", "2016-06-06 09:00:00.0000000", "2016-06-06 12:47:00.0000000", "2016-06-06 10:27:00.0000000", "2016-02-04 07:15:00.0000000", "2016-03-15 07:16:00.0000000", "2016-03-16 08:10:00.0000000", "2016-03-15 14:56:00.0000000", "2016-03-17 00:00:00.0000000", "2016-03-10 20:25:00.0000000", "2016-03-10 20:01:00.0000000", "2016-03-11 00:00:00.0000000", "2016-03-12 00:00:00.0000000"), EndTime = c("2016-03-02 15:20:00.0000000", "2016-06-07 11:02:00.0000000", "2016-06-06 10:27:00.0000000", "2016-06-07 00:00:00.0000000", "2016-06-06 12:47:00.0000000", "2016-02-04 13:47:00.0000000", "2016-03-15 14:56:00.0000000", "2016-03-17 00:00:00.0000000", "2016-03-16 08:10:00.0000000", "2016-03-17 11:18:00.0000000", "2016-03-11 00:00:00.0000000", "2016-03-10 20:25:00.0000000", "2016-03-12 00:00:00.0000000", "2016-03-12 14:00:00.0000000"), room = c(102, 109, 109, 109, 103, 101, 109, 109, 109, 109, 105, 105, 105, 105), Bed = c(3, 2, 2, 2, 3, 4, 3, 3, 2, 3, 2, 1, 2, 2)), .Names = c("ID", "department", "BeginTime", "EndTime", "room", "Bed"), row.names = c(NA, -14L), class = c("tbl_df", "tbl", "data.frame"))
第二个例子:
data_bylocation2 <- structure(list(ID = c(2224003, 2224003, 2224003, 2248787, 2248787,2248787, 2248787, 2248787), department = c("B12A", "B12A", "B12A","B53A", "B53A", "B53A", "B53A", "B53A"), BeginTime = c("2016-02-12 08:00:00.0000000", "2016-02-12 13:40:00.0000000", "2016-02-15 00:00:00.0000000", "2016-04-20 10:00:00.0000000", "2016-04-22 00:00:00.0000000", "2016-04-23 00:00:00.0000000", "2016-04-24 11:47:00.0000000", "2016-04-26 00:00:00.0000000"), EndTime = c("2016-02-12 13:40:00.0000000", "2016-02-15 00:00:00.0000000", "2016-02-15 16:17:00.0000000", "2016-04-22 00:00:00.0000000", "2016-04-23 00:00:00.0000000", "2016-04-24 11:47:00.0000000", "2016-04-26 00:00:00.0000000", "2016-04-26 16:00:00.0000000"), room = c(205, 209, 209, 306, 306, 306, 311, 311), bed = c(3, 1, 1, 2, 2, 2, 4, 4)), .Names = c("ID", "department", "BeginTime", "EndTime", "room", "bed"), row.names = c(NA, -8L), class = c("tbl_df", "tbl", "data.frame"))
【问题讨论】:
-
为什么您的预期输出中有几个月/几天,而您的输入中没有?
-
确实,我从较大的数据帧中发布了两个不同的样本。我编辑了问题,以便现在匹配样本的 ID。
-
看起来像一个带有 data.tables dcast 的衬里,但我懒得自己复制你的数据:你能以“data
-
这(见编辑)是否足以作为一个可重复的例子?
-
@PeterPan 我很想看看如何使用
data.table中的单行来完成此操作(我花了 10 次)。