【问题标题】：why tidyverse group_by behave unexpected after R update为什么 tidyverse group_by 在 R 更新后表现出意外
【发布时间】：2021-06-20 17:49:39
【问题描述】：

它曾经运行良好，然后我更新了 R！更新后，group_by 函数将每一行视为一个组。在以下示例数据集dtt 中，如果我将数据集过滤到仅一组并运行代码，它会按预期工作。但是，如果对所有组运行相同的代码，则无法按预期工作。

这里是工作代码和不工作代码，下面是数据。

#Filter dtt 只过滤到一组 (x,y) 并运行代码，然后它按预期工作，如下所示

dtt_xy<-dtt%>%
        filter(x==-121 & y == 65)
dtt_xy

dtt_output <- dtt_xy%>%
  group_by(x, y) %>%
  group_by(grp = cumsum(c(TRUE, diff(Date) != 1)), .add = TRUE)
dtt_output #Expected output

#现在如果对整个数据集运行相同的代码，即dtt，它将不起作用

dtt_output <- dtt%>%
  group_by(x, y) %>%
  group_by(grp = cumsum(c(TRUE, diff(Date) != 1)), .add = TRUE)
dtt_output #Not expected output . expectation is 35 groups

样本数据

dtt<-structure(list(x = c(-121, -120, -121, -120, -121, -120, -121, 
-120, -121, -120, -121, -120, -121, -120, -121, -120, -121, -120, 
-121, -120, -121, -121, -121, -120, -121, -120, -121, -120, -121, 
-120, -121, -120, -121, -120, -121, -120, -121, -120, -121, -120, 
-121, -120, -121, -120, -121, -120, -121, -120, -121, -120, -121, 
-120, -121, -120, -121, -120, -121, -120, -121, -120, -121, -120, 
-121, -120, -121, -120, -121, -120, -121, -120, -121, -120, -121, 
-120, -121, -120, -121, -120, -121, -120, -121, -120, -121, -120, 
-121, -120, -120, -120, -121, -120, -121, -120, -121, -120, -121, 
-120, -121, -120, -121, -120, -121, -120, -121, -120, -121, -120, 
-121, -120, -121, -120, -121, -120, -121, -120, -121, -120, -121, 
-120, -121, -120, -121, -120, -121, -120, -121, -120, -121, -120, 
-121, -120, -121, -120, -121, -120, -121, -120, -121, -120, -121, 
-120, -121, -120, -121, -120, -121, -120, -121, -120, -121, -120
), y = c(65, 65, 63, 63, 65, 65, 63, 63, 65, 65, 63, 63, 65, 
65, 63, 63, 65, 65, 63, 63, 65, 63, 65, 65, 63, 63, 65, 65, 63, 
63, 65, 65, 63, 63, 65, 65, 63, 63, 65, 65, 63, 63, 65, 65, 63, 
63, 65, 65, 63, 63, 65, 65, 63, 63, 65, 65, 63, 63, 65, 65, 63, 
63, 65, 65, 63, 63, 65, 65, 63, 63, 65, 65, 65, 65, 65, 65, 65, 
65, 63, 63, 65, 65, 63, 63, 63, 63, 65, 63, 63, 63, 65, 65, 63, 
63, 65, 65, 63, 63, 65, 65, 63, 63, 65, 65, 63, 63, 63, 63, 65, 
65, 65, 65, 65, 65, 65, 65, 65, 65, 63, 63, 65, 65, 63, 63, 65, 
65, 65, 65, 63, 63, 65, 65, 63, 63, 65, 65, 63, 63, 65, 65, 63, 
63, 65, 65, 63, 63, 65, 65, 63, 63), Date = structure(c(5123, 
5123, 5123, 5123, 5124, 5124, 5124, 5124, 5125, 5125, 5125, 5125, 
5126, 5126, 5126, 5126, 5127, 5127, 5127, 5127, 5128, 5128, 5177, 
5177, 5177, 5177, 5178, 5178, 5178, 5178, 5179, 5179, 5179, 5179, 
5180, 5180, 5180, 5180, 5181, 5181, 5181, 5181, 5200, 5200, 5200, 
5200, 5201, 5201, 5201, 5201, 5202, 5202, 5202, 5202, 5203, 5203, 
5203, 5203, 5204, 5204, 5204, 5204, 5205, 5205, 5205, 5205, 5206, 
5206, 5206, 5206, 5238, 5238, 5239, 5239, 5240, 5240, 5273, 5273, 
5273, 5273, 5274, 5274, 5274, 5274, 5319, 5319, 5320, 5325, 5326, 
5326, 5327, 5327, 5327, 5327, 5328, 5328, 5328, 5328, 5329, 5329, 
5329, 5329, 5330, 5330, 5330, 5330, 5331, 5331, 5344, 5344, 5345, 
5345, 5381, 5381, 5382, 5382, 5383, 5383, 5383, 5383, 5384, 5384, 
5384, 5384, 5401, 5401, 5402, 5402, 5402, 5402, 5403, 5403, 5403, 
5403, 5404, 5404, 5404, 5404, 5405, 5405, 5405, 5405, 5406, 5406, 
5406, 5406, 5407, 5407, 5407, 5407), class = "Date")), row.names = c(NA, 
-150L), class = c("tbl_df", "tbl", "data.frame"))

会话信息

R version 4.0.4 (2021-02-15)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] forcats_0.5.1   stringr_1.4.0   dplyr_1.0.5     purrr_0.3.4     readr_1.4.0     tidyr_1.1.3    
[7] tibble_3.1.0    ggplot2_3.3.3   tidyverse_1.3.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6       cellranger_1.1.0 pillar_1.5.1     compiler_4.0.4   dbplyr_2.1.0     tools_4.0.4     
 [7] jsonlite_1.7.2   lubridate_1.7.10 lifecycle_1.0.0  gtable_0.3.0     pkgconfig_2.0.3  rlang_0.4.10    
[13] reprex_1.0.0     cli_2.3.1        rstudioapi_0.13  DBI_1.1.1        haven_2.3.1      withr_2.4.1     
[19] xml2_1.3.2       httr_1.4.2       fs_1.5.0         generics_0.1.0   vctrs_0.3.6      hms_1.0.0       
[25] grid_4.0.4       tidyselect_1.1.0 glue_1.4.2       R6_2.5.0         fansi_0.4.2      readxl_1.3.1    
[31] modelr_0.1.8     magrittr_2.0.1   backports_1.2.1  scales_1.1.1     ellipsis_0.3.1   rvest_1.0.0     
[37] assertthat_0.2.1 colorspace_2.0-0 utf8_1.2.1       stringi_1.5.3    munsell_0.5.0    broom_0.7.5     
[43] crayon_1.4.1

【问题讨论】：

你能解释一下你要做什么吗？
查看表明我的问题的修改后的问题。谢谢。

标签： r dplyr tidyverse

【解决方案1】：

这能回答你的问题吗？

dt %>%
  group_by(x, y) %>%
  mutate(grp = cumsum(c(TRUE, diff(Date) != 1)), .add = TRUE) %>%
  mutate(event = if (n() >= 5)
    cur_group_id()[n() >= 5]
    else    NA)

【讨论】：

谢谢，但不，它没有，它有 150 个组（每行作为一个组），而我希望基于 x 和 y 以及 grp 的组合有 35 个组。

【解决方案2】：

是的，这是 dplyr 最近的变化之一，当您执行嵌套 group_by 时。之前为 this 创建了一个问题，但该问题已关闭，而且这种行为似乎不会改变。

解决方法是使用mutate创建新列，然后在group_by中使用。

library(dplyr)

dtt%>%
  group_by(x, y) %>%
  mutate(grp = cumsum(c(TRUE, diff(Date) != 1))) %>%
  group_by(grp, .add = TRUE)

# A tibble: 150 x 4
# Groups:   x, y, grp [35]
#       x     y Date         grp
#   <dbl> <dbl> <date>     <int>
# 1  -121    65 1984-01-11     1
# 2  -120    65 1984-01-11     1
# 3  -121    63 1984-01-11     1
# 4  -120    63 1984-01-11     1
# 5  -121    65 1984-01-12     1
# 6  -120    65 1984-01-12     1
# 7  -121    63 1984-01-12     1
# 8  -120    63 1984-01-12     1
# 9  -121    65 1984-01-13     1
#10  -120    65 1984-01-13     1
# … with 140 more rows

【讨论】：

太棒了，你是这里的救命稻草！