【问题标题】:Subset All Rows Based on First Value of Column根据列的第一个值对所有行进行子集
【发布时间】:2020-04-27 10:18:59
【问题描述】:

我遇到了一个子集问题,我被困住了。这是数据的sn-p:

UniqueID MonthYear FirstObs
ABC123   OCT-18    1
ABC123   NOV-18    0
ABC123   JAN-19    0
ABC123   FEB-19    0
DEF446   MAY-19    1
DEF456   JUN-19    0
DEF456   JUL-19    0
GHI789   OCT-18    1
GHI789   NOV-18    0

数据集非常大,有一系列类似于上述示例的行。我希望能够编写一个子集公式,该公式提取具有相同 UniqueID 块的每一行,以 FirstObs=1 开头,并根据它们起源的月份将它们放在一起。我会有类似以下的内容:

Subset1 (all uniqueIDs that originated in October)
UniqueID MonthYear FirstObs
ABC123   OCT-18    1
ABC123   NOV-18    0
ABC123   JAN-19    0
ABC123   FEB-19    0
GHI789   OCT-18    1
GHI789   NOV-18    0


Subset2 (all uniqueIDs that originated in May)
UniqueID MonthYear FirstObs    
DEF446   MAY-19    1
DEF456   JUN-19    0
DEF456   JUL-19    0

理想情况下,每个月以 FirstObs=1 开头的每个块都有一个子集。我知道我需要使用一些 ifelse 和子集函数系列,但我不确定如何以最佳方式使用它们。

【问题讨论】:

    标签: r if-statement subset


    【解决方案1】:

    这可能有帮助

    library(dplyr)
    df2 <- df1 %>% 
         group_by(UniqueID) %>% 
         filter(first(FirstObs) == 1 & n() > 1)
    df3 <- anti_join(df1, df2)
    

    也可以

    library(stringr)
    df2 <- df1 %>%
              group_by(UniqueID) %>% 
              filter(first(FirstObs) == 1, 
                    str_remove(first(MonthYear), "-\\d+") == "OCT")
    df3 <- anti_join(df1, df2)
    

    或将split 转换为data.frames 的list

    df1 %>%
       group_by(UniqueID) %>%
       mutate(grp = first(FirstObs) == 1 & n() > 1) %>%
       ungroup %>%
       group_split(grp, keep = FALSE)
    #[[1]]
    # A tibble: 3 x 3
    #  UniqueID MonthYear FirstObs
    #  <chr>    <chr>        <int>
    #1 DEF446   MAY-19           1
    #2 DEF456   JUN-19           0
    #3 DEF456   JUL-19           0
    
    #[[2]]
    # A tibble: 6 x 3
    #  UniqueID MonthYear FirstObs
    #  <chr>    <chr>        <int>
    #1 ABC123   OCT-18           1
    #2 ABC123   NOV-18           0
    #3 ABC123   JAN-19           0
    #4 ABC123   FEB-19           0
    #5 GHI789   OCT-18           1
    #6 GHI789   NOV-18           0
    

    数据

    df1 <- structure(list(UniqueID = c("ABC123", "ABC123", "ABC123", "ABC123", 
    "DEF446", "DEF456", "DEF456", "GHI789", "GHI789"), MonthYear = c("OCT-18", 
    "NOV-18", "JAN-19", "FEB-19", "MAY-19", "JUN-19", "JUL-19", "OCT-18", 
    "NOV-18"), FirstObs = c(1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L)), 
    class = "data.frame", row.names = c(NA, 
    -9L))
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-08-19
      相关资源
      最近更新 更多