【问题标题】:How to summarise time-series data with unequal number of observations with R如何用 R 总结观察次数不等的时间序列数据
【发布时间】:2015-04-10 22:21:54
【问题描述】:

我有一个大型数据框(86000 行),其中包含几位患者,每个患者在住院期间都进行了多次血液检查(只有 3 次测试:T1、T2 和 T3)。这些患者有的住院 3 天,有的住院 168 天。

这只是count function 输出的一小部分,它显示了住院天数的巨大变化:

No  Id     Days
148 29757  111
149 30368   36
150 31062   29
151 31993   24
152 32198   51
153 32438    6
154 32836   74
155 32944   24
156 33467   39
157 36108   90
158 36849    6
159 37136    3

我使用聚合来计算平均值等,但我想看看谁在逗留期间确实有所改善或恶化。

我认为这将涉及至少提取第一个和最后一个测试,并获取差异(越低越好)。但我找不到办法做到这一点。

我认为更简单的解决方案是将整个结果转换为有序数据(根据测试的正常范围),并查看其中有多少具有异常低或异常高的值。不幸的是,几乎每个病人都有低谷和高潮。

理想情况下,我希望看到几位患者(或患者组)随着时间的推移而取得的进展。但由于他们在不同的时间范围内住院,(过于简单化的)结果是这样的:

如您所见,第一个患者(红点)从中等值开始,迅速恶化(高值),然后好转(低值)。第二名患者的进展尚不清楚,因为他/她的逗留时间可能很短。

有人可以推荐一个启动器(代码或想法)吗?我检查了some questions 关于multiple time-series plots with unequal observations,但它们对我的情况没有帮助。 这里是一个示例(匿名)数据集:

structure(list(Id = c("10200", "10200", "10200", "10200", "10200", 
"10200", "10700", "10700", "10700", "10700", "10700", "10700", 
"10700", "10700", "10700", "10700", "10700", "10700", "10700", 
"10700", "10700", "10766", "10766", "10766", "10766", "10766", 
"10766", "10766", "10766", "10766", "10766", "10766", "10766", 
"10766", "10766", "10766", "10766", "10766", "10766", "10766"
), Date = structure(c(15068, 15068, 15068, 15069, 15069, 15069, 
15072, 15072, 15072, 15072, 15072, 15072, 15073, 15073, 15073, 
15075, 15075, 15075, 15078, 15078, 15078, 15073, 15074, 15074, 
15075, 15075, 15075, 15075, 15076, 15076, 15076, 15078, 15078, 
15078, 15081, 15082, 15083, 15084, 15085, 15085), class = "Date"), 
    Test = c("T1", "T2", "T3", "T1", "T2", "T3", "T1", "T1", 
    "T2", "T2", "T3", "T3", "T1", "T2", "T3", "T1", "T2", "T3", 
    "T1", "T2", "T3", "T1", "T1", "T2", "T1", "T1", "T2", "T2", 
    "T1", "T2", "T3", "T1", "T2", "T3", "T1", "T1", "T2", "T1", 
    "T1", "T2"), Result = c(131, 4.53, 5.4, 108, 3.19, 3.7, 125, 
    NA, 1.26, NA, NA, 3.8, 125, 0.97, 4.2, 73, 0.84, 6.6, 48, 
    0.52, 4.8, 60, 75, 0.83, 52, 51, 0.62, 0.65, 40, 0.57, 4.1, 
    45, 0.54, 3.7, 96, 77, 1.04, 134, 144, 0.95)), .Names = c("Id", 
"Date", "Test", "Result"), row.names = c(3L, 6L, 4L, 2L, 1L, 
5L, 10L, 14L, 9L, 19L, 8L, 11L, 20L, 18L, 7L, 17L, 13L, 21L, 
12L, 15L, 16L, 22L, 28L, 29L, 24L, 31L, 26L, 33L, 34L, 32L, 37L, 
23L, 35L, 25L, 38L, 36L, 30L, 27L, 39L, 40L), class = "data.frame")

【问题讨论】:

    标签: r time-series


    【解决方案1】:

    我不知道这是否是你想要的,但你可以使用 dplyr 包。下面的代码将数据按“Id”分组,然后在Result中找到第一个和最后一个值,最后在一个新列中计算“difference”

    mydata <- structure(list(Id=c ( "10200", "10200", "10200", "10200", "10200", "10200", "10700", "10700", "10700", "10700", "10700", "10700", "10700", "10700", "10700", "10700", "10700", "10700", "10700", "10700", "10700", "10766", "10766", "10766",
    "10766", "10766", "10766", "10766", "10766", "10766", "10766", "10766", "10766", "10766", "10766", "10766", "10766", "10766", "10766", "10766" ), Date=s tructure(c(15068, 15068, 15068, 15069, 15069, 15069, 15072, 15072, 15072, 15072, 15072, 15072, 15073, 15073,
    15073, 15075, 15075, 15075, 15078, 15078, 15078, 15073, 15074, 15074, 15075, 15075, 15075, 15075, 15076, 15076, 15076, 15078, 15078, 15078, 15081, 15082, 15083, 15084, 15085, 15085), class="Date" ), Test=c ( "T1", "T2", "T3", "T1", "T2", "T3", "T1",
    "T1", "T2", "T2", "T3", "T3", "T1", "T2", "T3", "T1", "T2", "T3", "T1", "T2", "T3", "T1", "T1", "T2", "T1", "T1", "T2", "T2", "T1", "T2", "T3", "T1", "T2", "T3", "T1", "T1", "T2", "T1", "T1", "T2"), Result=c (131, 4.53, 5.4, 108, 3.19, 3.7, 125, NA, 1.26,
    NA, NA, 3.8, 125, 0.97, 4.2, 73, 0.84, 6.6, 48, 0.52, 4.8, 60, 75, 0.83, 52, 51, 0.62, 0.65, 40, 0.57, 4.1, 45, 0.54, 3.7, 96, 77, 1.04, 134, 144, 0.95)), .Names=c ( "Id", "Date", "Test", "Result"), row.names=c (3L, 6L, 4L, 2L, 1L, 5L, 10L, 14L, 9L, 19L,
    8L, 11L, 20L, 18L, 7L, 17L, 13L, 21L, 12L, 15L, 16L, 22L, 28L, 29L, 24L, 31L, 26L, 33L, 34L, 32L, 37L, 23L, 35L, 25L, 38L, 36L, 30L, 27L, 39L, 40L), class="data.frame" )
    
    library(dplyr) 
    
    result <- mydata %>%
      group_by(Id) %>%  
      summarise_each(funs(first, last), Result) %>%
      mutate(difference = first - last)
    result
    

    【讨论】:

      猜你喜欢
      • 2020-11-27
      • 2020-02-04
      • 1970-01-01
      • 1970-01-01
      • 2021-10-24
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多