【问题标题】:Filter grouped data to the rows where grouping changes将分组数据过滤到分组更改的行
【发布时间】:2018-09-18 22:12:13
【问题描述】:

有了这些数据(下面有dput()),其中IndIDII是一个分组列,其中每个Year有多个MigStratFact观察。

> head(Dat)
  IndIDII Year MigStratFact
1 BHS_376 2015      MidDist
2 BHS_376 2016      MidDist
3 BHS_376 2017      MidDist
4 BHS_376 2018    ShortDist
5 BHS_378 2015      MidDist
6 BHS_378 2016    ShortDist

我想将Dat 过滤到MigStratFactlead() 值与当前值不匹配的行,并保留当前字段。

使用下面的代码,对于每个 IndIDII,我可以过滤到 lead(MigStratFact) != MigStratFact 所在的行,但我不确定如何保留引用(即当前)行。

Dat %>%
  group_by(IndIDII) %>% 
  filter(lead(MigStratFact) != MigStratFact) %>% 
  as.data.frame()

所需的解决方案将过滤到第 3、4、5、6、8、9、11、12、15、16 行。

在此先感谢

Dat <- structure(list(IndIDII = c("BHS_376", "BHS_376", "BHS_376", "BHS_376", 
    "BHS_378", "BHS_378", "BHS_378", "BHS_391", "BHS_391", "BHS_394", 
    "BHS_394", "BHS_394", "BHS_395", "BHS_395", "BHS_395", "BHS_395"
    ), Year = c("2015", "2016", "2017", "2018", "2015", "2016", "2017", 
    "2015", "2016", "2016", "2017", "2018", "2015", "2016", "2017", 
    "2018"), MigStratFact = structure(c(3L, 3L, 3L, 2L, 3L, 2L, 2L, 
    2L, 3L, 3L, 3L, 2L, 3L, 3L, 3L, 2L), .Label = c("Resident", "ShortDist", 
    "MidDist", "LongDist"), class = "factor")), class = "data.frame", row.names = c(NA, 
    -16L))

【问题讨论】:

    标签: r dplyr


    【解决方案1】:

    试试改成

    Dat %>%
      group_by(IndIDII) %>% 
      filter(lead(MigStratFact) != MigStratFact | lag(MigStratFact) != MigStratFact)
    #    IndIDII Year MigStratFact
    # 1  BHS_376 2017      MidDist
    # 2  BHS_376 2018    ShortDist
    # 3  BHS_378 2015      MidDist
    # 4  BHS_378 2016    ShortDist
    # 5  BHS_391 2015    ShortDist
    # 6  BHS_391 2016      MidDist
    # 7  BHS_394 2017      MidDist
    # 8  BHS_394 2018    ShortDist
    # 9  BHS_395 2017      MidDist
    # 10 BHS_395 2018    ShortDist
    

    【讨论】:

      【解决方案2】:

      @konvas 的答案很难被置顶,但这里有另一种解决方案。我接受了按索引而不是按逻辑过滤的挑战,但我承认这有点难以阅读。

      Dat %>%
        group_by(IndIDII) %>% 
        filter(row_number() %in% c(a <-  which(lead(MigStratFact) != MigStratFact), a + 1))
      
      # A tibble: 10 x 3
      # Groups:   IndIDII [5]
         IndIDII Year  MigStratFact
         <chr>   <chr> <fct>       
       1 BHS_376 2017  MidDist     
       2 BHS_376 2018  ShortDist   
       3 BHS_378 2015  MidDist     
       4 BHS_378 2016  ShortDist   
       5 BHS_391 2015  ShortDist   
       6 BHS_391 2016  MidDist     
       7 BHS_394 2017  MidDist     
       8 BHS_394 2018  ShortDist   
       9 BHS_395 2017  MidDist     
      10 BHS_395 2018  ShortDist
      

      【讨论】:

        猜你喜欢
        • 2014-10-23
        • 2022-01-14
        • 2010-10-11
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2023-04-08
        相关资源
        最近更新 更多