【问题标题】:mutate per group by in RR中的每组变异
【发布时间】:2018-11-28 18:39:03
【问题描述】:

我想从传感器数据中识别出片段并给它们一个 ID。因此,我想按 Sensor 列对以下数据集进行分组,并查看 Value 行是否从 0 切换到 1。当它识别第一块时,caseid 切换为 1(如手工制作的 caseid 列)。只要值保持 1,它就保持 1。当它变为 0 时,它应该切换回 0。 在下一次从 0 切换到 1 时,caseid 应该变为 2,因为传感器可以识别第二个片段,依此类推..

time = c("07:00:01","07:00:01","07:00:01","07:00:02","07:00:02","07:00:02","07:00:03","07:00:03","07:00:03","07:00:04",
     "07:00:04","07:00:04","07:00:05","07:00:05","07:00:05","07:00:06","07:00:06","07:00:06","07:00:07","07:00:07",
     "07:00:07","07:00:08","07:00:08","07:00:08","07:00:09","07:00:09","07:00:09")
sensor = c(10001,10002,10003,10001,10002,10003,10001,10002,10003,10001,10002,10003,10001,10002,10003,10001,10002,10003,
       10001,10002,10003,10001,10002,10003,10001,10002,10003)
values = c(0,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,0,0,0,0,1,0,1,1,0,1)
caseid = c(0,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,0,0,0,0,2,0,1,2,0,1)

data = data.frame(time,sensor,values,caseid)

(所以 data$caseid 是我想要得到的)

我认为这可以通过 group by 以某种方式实现,但我无法做到这一点,所以我选择了另一种(草率的)方法。这就是我得到的。

data%>% 
filter(Sensor=="10002") -> sensor_data_temp

sensor_data_temp$CaseID2 <- NA 
case_id = 1

for(i in 1:nrow(sensor_data_temp)){

   current_value <- sensor_data_temp[i,"values"]
   next_value <- sensor_data_temp[i+1,"values"]

   if(i+1 > nrow(sensor_data_temp)){
     break
   }

   if(current_value==0 & next_value==1 || current_value==1 & next_value==1){
     sensor_data_temp$CaseID2[i+1] <- case_id
   }
   else if(current_value==1 & next_value==0){
     sensor_data_temp$CaseID2[i+1] <- 0
     case_id = case_id +1
   }
   else{
     sensor_data_temp$CaseID2[i+1] <- 0
   }

}

我认为这就是我可以为一个传感器获取 caseid 的方法。但我不知道如何设法将每个传感器放入一个数据帧(如上所述)

我确信有一种更优雅的方式可以得到我想要的东西。

我希望有人可以帮助我.. 在此先感谢! :)

【问题讨论】:

  • 我认为你需要来自 dplyr 的lead

标签: r group-by dplyr


【解决方案1】:

这是一种方法:

library(dplyr)

mutate(group_by(arrange(data, sensor, time), sensor),
       caseID = case_when(values != 0 ~ cumsum(diff(c(0, values)) > 0),
                          TRUE ~ 0L))

【讨论】:

  • 完美!很难想象它那么容易^^,因为我理解 case_when 工作矢量化,但我不太明白它是如何工作的。因为 cumsum(diff(c(0, values)) > 0) 给了我一个向量。但它如何将此向量与单个列条目匹配?非常感谢您的回答!
  • @JonasPirner case_when 有很好的文档记录; ?case_when 比我解释得更好。
【解决方案2】:

这是data.table的解决方案

library("data.table")

data <- data.table(
  time = c("07:00:01","07:00:01","07:00:01","07:00:02","07:00:02","07:00:02","07:00:03","07:00:03","07:00:03","07:00:04",
         "07:00:04","07:00:04","07:00:05","07:00:05","07:00:05","07:00:06","07:00:06","07:00:06","07:00:07","07:00:07",
         "07:00:07","07:00:08","07:00:08","07:00:08","07:00:09","07:00:09","07:00:09"),
  sensor = c(10001,10002,10003,10001,10002,10003,10001,10002,10003,10001,10002,10003,10001,10002,10003,10001,10002,10003,
           10001,10002,10003,10001,10002,10003,10001,10002,10003),
  values = c(0,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,0,0,0,0,1,0,1,1,0,1),
  caseid = c(0,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,0,0,0,0,2,0,1,2,0,1))

data[, caseID:=ifelse(values==0, 0, cumsum(diff(c(0, values))==1)), sensor][]

并且没有ifelse():

data[, caseID:= { v <- rep(0, .N); v[values==1] <- cumsum(diff(c(0, values))==1)[values==1]; v }, sensor][]

【讨论】:

  • 非常好!谢谢!我一定要检查 data.table!我刚刚使用 tidyverse ^^
猜你喜欢
  • 2018-12-13
  • 1970-01-01
  • 1970-01-01
  • 2019-05-30
  • 1970-01-01
  • 1970-01-01
  • 2018-04-13
  • 2019-08-23
  • 1970-01-01
相关资源
最近更新 更多