【问题标题】:Keep the same ids in different dataframes在不同的数据帧中保持相同的 id
【发布时间】:2020-01-23 19:07:00
【问题描述】:

我有三个数据框:

df1 <- data.frame(id = c(1,2,3,4,5), var = c(2,4,52,2,5))
df2 <- data.frame(id  = c(1,3,4,5,6), var = c(4,5,2,6,2))
df3 <- data.frame(id = c(1,3,5), var = c(31,3,5))

如何使用 id 列在所有三个数据帧中保留三个数据帧中具有相同 id 的行?

预期输出示例:

df1 <- data.frame(id = c(1,3,5), var = (2,52,5))
df2 <- data.frame(id = c(1,3,5), var = (4,5,6))
df3 <- data.frame(id = c(1,3,5), var = 31,3,5))

【问题讨论】:

    标签: r


    【解决方案1】:

    我们可以使用intersect获取所有数据集中通用的'id',然后subset基于'ids'的数据集

    ids <- Reduce(intersect, list(df1$id,  df2$id, df3$id))
    df1 <- subset(df1, id %in% ids)
    df2 <- subset(df2, id %in% ids)
    df3 <- subset(df2, id %in% ids)
    

    此外,所有数据集都可以加载到list

    lst1 <-  mget(ls(pattern = "^df\\d+$"))
    ids <- Reduce(intersect, lapply(lst1, `[[`, 'id'))
    lapply(lst1, subset, id %in% ids)
    #$df1
    #  id var
    #1  1   2
    #3  3  52
    #5  5   5
    
    #$df2
    #  id var
    #1  1   4
    #2  3   5
    #4  5   6
    
    #$df3
    #  id var
    #1  1  31
    #2  3   3
    #3  5   5
    

    【讨论】:

      【解决方案2】:

      一个dplyr 选项可能是:

      bind_rows(list(df1, df2, df3), .id = "df_id") %>%
       mutate(df_id_dist = n_distinct(df_id)) %>%
       group_by(id) %>%
       filter(n_distinct(df_id) == df_id_dist) %>%
       select(-df_id_dist) %>%
       ungroup() %>%
       group_split(df_id)
      
      [[1]]
      # A tibble: 3 x 3
        df_id    id   var
        <chr> <dbl> <dbl>
      1 1         1     2
      2 1         3    52
      3 1         5     5
      
      [[2]]
      # A tibble: 3 x 3
        df_id    id   var
        <chr> <dbl> <dbl>
      1 2         1     4
      2 2         3     5
      3 2         5     6
      
      [[3]]
      # A tibble: 3 x 3
        df_id    id   var
        <chr> <dbl> <dbl>
      1 3         1    31
      2 3         3     3
      3 3         5     5
      

      在dfs中自动加载:

      mget(ls(pattern = "^df")) %>%
       bind_rows(., .id = "df_id") %>%
       mutate(df_id_dist = n_distinct(df_id)) %>%
       group_by(id) %>%
       filter(n_distinct(df_id) == df_id_dist) %>%
       select(-df_id_dist) %>%
       ungroup() %>%
       group_split(df_id)
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2020-02-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2021-09-29
        • 2017-10-20
        • 1970-01-01
        • 2018-09-10
        相关资源
        最近更新 更多