【问题标题】:How to Create Values based on Start-Stop Info in Separate Column如何根据单独列中的启停信息创建值
【发布时间】:2019-04-22 19:53:10
【问题描述】:

我有一个由研究设备创建的非常混乱的数据集。该数据显示每隔几毫秒(“时间”)的生理测量值(“Physio”)。输出列出了几条用户消息,例如试验何时开始(“START_TRIAL n”)、试验何时结束(“STOP_TRIAL”),以及研究人员可能感兴趣的其他随机信息。有时“START_TRIAL n”消息会连续重复,有时当没有消息时,会在空白单元格中留下一个简单的“0”。

我希望创建一个新列,以表明当前案件属于哪个审判。 (参见下面的示例数据)。

有没有办法用 dplyr 和 mutate 做到这一点?我想知道是否可能需要执行 if-then 语句来更改每种情况下新列的值,但肯定有更优雅的解决方案吗? (提前感谢您帮助这个新手!)

Time    Physio  Cond
1   34  START_TRIAL 1
2   33  0
3   25  RANDOM_MSG
4   43  STOP_TRIAL
5   27  START_TRIAL 2
6   54  START_TRIAL 2
7   32  0
8   54  RANDOM_MSG
9   23  STOP_TRIAL

structure(list(Time = 1:9, Physio = c(34L, 33L, 25L, 43L, 27L, 
54L, 32L, 54L, 23L), Cond = structure(c(4L, 2L, 3L, 6L, 5L, 5L, 
2L, 3L, 6L), .Label = c("", "0", "RANDOM_MSG", "START_TRIAL 1", 
"START_TRIAL 2", "STOP_TRIAL"), class = "factor")), .Names = c("Time", 
"Physio", "Cond"), row.names = c(NA, 9L), class = "data.frame")

进入

Time    Physio  Trial   Cond
1   34  1   START_TRIAL 1
2   33  1   0
3   25  1   RANDOM_MSG
4   43  1   STOP_TRIAL
5   27  2   START_TRIAL 2
6   54  2   START_TRIAL 2
7   32  2   0
8   54  2   RANDOM_MSG
9   23  2   STOP_TRIAL

structure(list(Time = 1:9, Physio = c(34L, 33L, 25L, 43L, 27L, 
54L, 32L, 54L, 23L), Trial = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L), Cond = structure(c(4L, 2L, 3L, 6L, 5L, 5L, 2L, 3L, 6L), .Label = c("", 
"0", "RANDOM_MSG", "START_TRIAL 1", "START_TRIAL 2", "STOP_TRIAL"
), class = "factor")), .Names = c("Time", "Physio", "Trial", 
"Cond"), row.names = c(NA, 9L), class = "data.frame")

【问题讨论】:

    标签: r dataframe dplyr tidyr


    【解决方案1】:

    一种选择是用grep 标识“START_TRIAL”,使用match 来获取索引,fill NA 元素与前一个非 NA 相邻元素

    library(dplyr)
    library(tidyr)
    df1 %>% 
       mutate(Trial = match(PhysioCond, unique(grep("START_TRIAL", 
                 PhysioCond, value = TRUE)))) %>% 
       fill(Trial)
    #    Time    PhysioCond Trial
    #1   34 START_TRIAL 1     1
    #2   33             0     1
    #3   25    RANDOM_MSG     1
    #4   43    STOP_TRIAL     1
    #5   27 START_TRIAL 2     2
    #6   54 START_TRIAL 2     2
    #7   32             0     2
    #8   54    RANDOM_MSG     2
    #9   23    STOP_TRIAL     2
    

    注意:不清楚列名,但逻辑应该可以正常工作

    数据

    df1 <- structure(list(Time = c(34L, 33L, 25L, 43L, 27L, 54L, 32L, 54L, 
     23L), PhysioCond = c("START_TRIAL 1", "0", "RANDOM_MSG", "STOP_TRIAL", 
    "START_TRIAL 2", "START_TRIAL 2", "0", "RANDOM_MSG", "STOP_TRIAL"
     )), class = "data.frame", row.names = c("1", "2", "3", "4", "5", 
      "6", "7", "8", "9"))
    

    【讨论】:

    • @akrun 我喜欢你的回答,因为你的回答很摇滚!!!非常感谢
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2013-01-19
    • 1970-01-01
    • 1970-01-01
    • 2011-07-29
    • 2021-07-05
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多