如何为包括管道的代码创建循环答案

【问题标题】：How to make a loop for code including pipes如何为包括管道的代码创建循环
【发布时间】：2021-10-24 08:08:35
【问题描述】：

我对 R 代码相当陌生，并试图避免将同一行复制粘贴 20 次，因为我目前正在尝试手动执行此操作：我有一个包含 3 个变量的数据框：date.time、Depth、ms（示例）：

 date.time               Depth      ms
 1: 2015-12-20 00:48:50 113.5  0.316666667
 2: 2015-12-20 01:25:50 156.0 -0.966666667
 3: 2015-12-20 01:26:50 170.5 -0.241666667
 4: 2015-12-20 01:27:50 215.5 -0.750000000
 5: 2015-12-20 01:28:50 276.5 -1.016666667
 6: 2015-12-20 01:29:50 373.0 -1.608333333
 7: 2015-12-20 01:30:50 453.0 -1.333333333
 8: 2015-12-20 01:31:50 500.0 -0.783333333
 9: 2015-12-20 01:35:50 512.0  0.241666667
10: 2015-12-20 03:53:50 285.0  0.058333333
11: 2015-12-20 03:54:50 355.0 -1.166666667
12: 2015-12-20 03:55:50 453.5 -1.641666667
12: 2015-12-20 03:57:50 526.0  0.000000000
14: 2015-12-21 15:01:50 449.5  0.016666667
15: 2015-12-21 15:02:50 467.5 -0.300000000
16: 2015-12-21 16:07:50 308.5  0.100000000
17: 2015-12-21 16:08:50 392.0 -1.391666667
18: 2015-12-21 16:09:50 491.0 -1.650000000
19: 2015-12-21 16:11:50 581.0  0.000000000
20: 2015-12-22 22:02:50 461.0  0.075000000
21: 2015-12-22 22:03:50 463.0 -0.033333333
22: 2015-12-22 22:04:50 466.0 -0.050000000
23: 2015-12-22 22:05:50 467.5 -0.025000000
24: 2015-12-22 22:06:50 468.0 -0.008333333
25: 2015-12-22 22:07:50 471.0 -0.050000000
26: 2015-12-22 22:08:50 472.5 -0.025000000
27: 2015-12-22 22:09:50 530.0 -0.958333333

我已手动完成此操作，通过选择潜水开始和结束的行来分隔每次潜水（例如：

d1<- df[c(1:9),]
d2<- df[c(10:13),]
d3<- df[c(14:20),]
d4<- df[c(21:27),]

然后将以下代码应用于我正在创建的每个新 df (d1, d2, d3, d4)（以下是 d1 的示例）：

    d1<- newdf[c(1:19),]
d1$date.time <- as_datetime(d1$date.time)
str(d1)

d1 %>% 
  group_by(Ptt) %>%
  mutate(
    diffMin = difftime(date.time, lag(date.time,1, default = date.time[1] ), unit = "mins") %>% #calculate time diff of each row
      as.numeric() %>% #changes to numeric
      cumsum() #gets cumulative sum
  ) -> d1
d1$Divenumber <- as.character('1')

这给了我想要的输出：

d1
         date.time           Depth     ms diffMin Divenumber
       <dttm>              <dbl>  <dbl>   <dbl> <chr>     
     1 2015-12-20 00:48:50  114.  0.317       0 1         
     2 2015-12-20 01:25:50  156  -0.967      37 1         
     3 2015-12-20 01:26:50  170. -0.242      38 1         
     4 2015-12-20 01:27:50  216. -0.75       39 1         
     5 2015-12-20 01:28:50  276. -1.02       40 1         
     6 2015-12-20 01:29:50  373  -1.61       41 1         
     7 2015-12-20 01:30:50  453  -1.33       42 1         
     8 2015-12-20 01:31:50  500  -0.783      43 1         
     9 2015-12-20 01:35:50  512   0.242      47 1         
    

d2
  

    date.time           Depth      ms diffMin Divenumber
      <dttm>              <dbl>   <dbl>   <dbl> <chr>     
    1 2015-12-20 03:53:50  285   0.0583       0 2         
    2 2015-12-20 03:54:50  355  -1.17         1 2         
    3 2015-12-20 03:55:50  454. -1.64         2 2         
    4 2015-12-20 03:57:50  526   0            4 2

对于每个新的 df，但正如您所见，为了获取每个新的 df 然后在最后绑定它们，这是相当多的复制粘贴。我确信有一种更快的方法可以做到这一点，但经过几个小时的尝试后还不能完全正确。有人可以帮我这样做（也许在某种类型的循环中），这将允许我循环遍历整个数据集，并为每次新的潜水分配一个新的潜水编号，以及从那次潜水开始的时间差和在几分钟内结束潜水？此外，将来不必手动分离潜水会很棒，并且只能考虑使用case_whenlag和date.time创建某种类型的代码来区分潜水。但我很高兴有任何其他可能的建议！

这是我的数据子集的 dput：

structure(list(date.time = structure(c(1450572530, 1450574750, 
1450574810, 1450574870, 1450574930, 1450574990, 1450575050, 1450575110, 
1450575350, 1450583630, 1450583690, 1450583750, 1450583870, 
1450710110, 1450710170, 1450714070, 1450714130, 1450714190, 1450714310, 
1450821770, 1450821830, 1450821890, 1450821950, 1450822010, 1450822070, 
1450822130, 1450822190), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    Depth = c(113.5, 156, 170.5, 215.5, 276.5, 373, 453, 500, 
    512, 285, 355, 453.5, 526, 449.5, 467.5, 308.5, 392, 
    491, 581, 461, 463, 466, 467.5, 468, 471, 472.5, 530), ms = c(0.316666666666667, 
    -0.966666666666667, -0.241666666666667, -0.75, -1.01666666666667, 
    -1.60833333333333, -1.33333333333333, -0.783333333333333, 
    0.241666666666667, 0.0583333333333333, 
    -1.16666666666667, -1.64166666666667, 0, 0.0166666666666667, 
    -0.3, 0.1, -1.39166666666667, -1.65, 0, 0.075, -0.0333333333333333, 
    -0.05, -0.025, -0.00833333333333333, -0.05, -0.025, -0.958333333333333
    )), row.names = c(NA, -28L), class = c("data.table", "data.frame"
)

提前致谢

【问题讨论】：

你怎么知道 Dive 什么时候开始和什么时候结束？
通过手动观察，您可以看到 timediff 增加的幅度明显大于同一潜水中的时间差异。另一种方法是查看“深度” - 对于一次潜水，它应该逐渐增加，一旦深度达到 500，它不应该作为同一潜水的一部分而减少。
@Meg.abytes 从数学上讲，潜水是在深度增量开始达到下降之前达到的最高值时，对吧？下一次潜水是从下一个索引向上直到增量继续等等....
@Shibaprasadb 是的，没错！

标签： r loops pipe dplyr

【解决方案1】：

执行上述Ronak发布的代码，然后使用管道按潜水分组并计算累积潜水时间：

df <- df %>% 
  group_by(dive) %>%
  mutate(
    diffMin = difftime(date.time, lag(date.time,1, default = date.time[1] ), unit = "mins") %>% #calculate time diff of each row
      as.numeric() %>% #changes to numeric
      cumsum()) #gets cumulative sum

【讨论】：

【解决方案2】：

另一种方法。我使用了一个简单的 while 循环来完成您所要求的操作。并使用了你在评论中所说的潜水逻辑。如果您有任何疑问，请告诉我。

#Load the data in df

#Create a list for the dive. Set the first element as 1, as it will be dive 1

dive <- c(1)

#Create a counter
dive_count <- 1

#Start the while loop from i =2, as the first one is automatically considered in dive 1

i <-2
while (i <= nrow(df)) {
  if (df$Depth[i]> df$Depth[i-1]){
    dive[i] <- dive_count
  }
  else{
    dive_count <- dive_count+1
    dive[i] <- dive_count
  }
  i<- i+1
}

df$dive <- dive

检查最终的数据帧

df

 date.time Depth           ms dive
1  2015-12-20 00:48:50 113.5  0.316666667    1
2  2015-12-20 01:25:50 156.0 -0.966666667    1
3  2015-12-20 01:26:50 170.5 -0.241666667    1
4  2015-12-20 01:27:50 215.5 -0.750000000    1
5  2015-12-20 01:28:50 276.5 -1.016666667    1
6  2015-12-20 01:29:50 373.0 -1.608333333    1
7  2015-12-20 01:30:50 453.0 -1.333333333    1
8  2015-12-20 01:31:50 500.0 -0.783333333    1
9  2015-12-20 01:35:50 512.0  0.241666667    1
10 2015-12-20 03:53:50 285.0  0.058333333    2
11 2015-12-20 03:54:50 355.0 -1.166666667    2
12 2015-12-20 03:55:50 453.5 -1.641666667    2
13 2015-12-20 03:57:50 526.0  0.000000000    2
14 2015-12-21 15:01:50 449.5  0.016666667    3
15 2015-12-21 15:02:50 467.5 -0.300000000    3
16 2015-12-21 16:07:50 308.5  0.100000000    4
17 2015-12-21 16:08:50 392.0 -1.391666667    4
18 2015-12-21 16:09:50 491.0 -1.650000000    4
19 2015-12-21 16:11:50 581.0  0.000000000    4
20 2015-12-22 22:02:50 461.0  0.075000000    5
21 2015-12-22 22:03:50 463.0 -0.033333333    5
22 2015-12-22 22:04:50 466.0 -0.050000000    5
23 2015-12-22 22:05:50 467.5 -0.025000000    5
24 2015-12-22 22:06:50 468.0 -0.008333333    5
25 2015-12-22 22:07:50 471.0 -0.050000000    5
26 2015-12-22 22:08:50 472.5 -0.025000000    5
27 2015-12-22 22:09:50 530.0 -0.958333333    5

【讨论】：

这是对我的问题的一个很好的解决方案，也有助于指出我的数据框中的细微不一致。谢谢！

【解决方案3】：

将阈值保持为 2 小时，您可以使用 cumsum 自动创建 dive 列 -

library(dplyr)

n_seconds <- 7200 #2hours

df <- df %>% 
       mutate(dive = cumsum(difftime(date.time, 
                     lag(date.time, default = first(date.time) - n_seconds - 1), 
                     units = 'secs') > n_seconds))
df

#             date.time Depth           ms dive
#1  2015-12-20 00:48:50 113.5  0.316666667    1
#2  2015-12-20 01:25:50 156.0 -0.966666667    1
#3  2015-12-20 01:26:50 170.5 -0.241666667    1
#4  2015-12-20 01:27:50 215.5 -0.750000000    1
#5  2015-12-20 01:28:50 276.5 -1.016666667    1
#6  2015-12-20 01:29:50 373.0 -1.608333333    1
#7  2015-12-20 01:30:50 453.0 -1.333333333    1
#8  2015-12-20 01:31:50 500.0 -0.783333333    1
#9  2015-12-20 01:35:50 512.0  0.241666667    1
#10 2015-12-20 03:53:50 285.0  0.058333333    2
#11 2015-12-20 03:54:50 355.0 -1.166666667    2
#12 2015-12-20 03:55:50 453.5 -1.641666667    2
#13 2015-12-20 03:57:50 526.0  0.000000000    2
#14 2015-12-21 15:01:50 449.5  0.016666667    3
#15 2015-12-21 15:02:50 467.5 -0.300000000    3
#16 2015-12-21 16:07:50 308.5  0.100000000    3
#17 2015-12-21 16:08:50 392.0 -1.391666667    3
#18 2015-12-21 16:09:50 491.0 -1.650000000    3
#19 2015-12-21 16:11:50 581.0  0.000000000    3
#20 2015-12-22 22:02:50 461.0  0.075000000    4
#21 2015-12-22 22:03:50 463.0 -0.033333333    4
#22 2015-12-22 22:04:50 466.0 -0.050000000    4
#23 2015-12-22 22:05:50 467.5 -0.025000000    4
#24 2015-12-22 22:06:50 468.0 -0.008333333    4
#25 2015-12-22 22:07:50 471.0 -0.050000000    4
#26 2015-12-22 22:08:50 472.5 -0.025000000    4
#27 2015-12-22 22:09:50 530.0 -0.958333333    4

您可以根据您的数据更改阈值，我根据提供的示例选择了 2 小时。

【讨论】：

谢谢。我执行了这段代码，然后在附加答案中看到了另一个管道操作员，以获得累积时间差