【问题标题】:Combining data frame based on data in each column in dplyr根据dplyr中每一列中的数据组合数据框
【发布时间】:2019-04-17 18:12:09
【问题描述】:

假设我有一些网络数据如下所示:

col_a <- c("A","B","C")
col_b <- c("B","A","A")
val <- c(1,3,7)
df <- data.frame(col_a, col_b, val)
df

  col_a col_b val
1     A     B   1
2     B     A   3
3     C     A   7

这可能是一个网络,而 val 可能是两者之间的边的权重。但是,我想在 A 和 B 以及 B 和 A 之间添加权重以获得以下结果:

new_col_a <- c("A", "A")
new_col_b <- c("B", "C")
new_val <- c(4,7)
want_df <- data.frame(new_col_a, new_col_b, new_val)
want_df

  new_col_a new_col_b new_val
1         A         B       4
2         A         C       7

dplyr 有没有办法做到这一点?

【问题讨论】:

    标签: r dplyr


    【解决方案1】:

    dplyr 的一种可能是:

    df %>%
     mutate_if(is.factor, as.character) %>%
     group_by(grp = paste(pmin(col_a, col_b), pmax(col_a, col_b), sep = "_")) %>%
     summarise(val = sum(val))
    
      grp     val
      <chr> <dbl>
    1 A_B       4
    2 A_C       7
    

    或者tidyverse,使用与@Sonny类似的想法:

    df %>%
     mutate_if(is.factor, as.character) %>%
     nest(col_a, col_b) %>%
     group_by(grp = unlist(map(data, function(x) paste(sort(x), collapse = "_")))) %>%
     summarise(val = sum(val))
    

    如果您还想将其分成两列(此步骤还需要tidyr):

    df %>%
     mutate_if(is.factor, as.character) %>%
     group_by(grp = paste(pmin(col_a, col_b), pmax(col_a, col_b), sep = "_")) %>%
     summarise(val = sum(val)) %>%
     separate(grp, c("new_col_a", "new_col_b"), sep = "_")
    
      new_col_a new_col_b   val
      <chr>     <chr>     <dbl>
    1 A         B             4
    2 A         C             7
    

    或者在第二种可能性的情况下:

    df %>%
     mutate_if(is.factor, as.character) %>%
     nest(col_a, col_b) %>%
     group_by(grp = unlist(map(data, function(x) paste(sort(x), collapse = "_")))) %>%
     summarise(val = sum(val)) %>%
     separate(grp, c("new_col_a", "new_col_b"), sep = "_")
    

    【讨论】:

      【解决方案2】:

      您可以为此使用dplyr

      df <- data.frame(col_a, col_b, val, stringsAsFactors = F)
      
      library(dplyr)
      library(tidyr)
      df %>% 
        mutate(
          pair = purrr::pmap_chr(
            .l = list(from = col_a, to = col_b),
            .f = function(from, to) paste(sort(c(from, to)), collapse = "_")
          )
        ) %>%
        group_by(pair) %>%
        summarise(new_val = sum(val)) %>%
        separate(pair, c("new_col_a", "new_col_b"), sep = "_")
        # A tibble: 2 x 3
        new_col_a new_col_b new_val
        <chr>     <chr>       <dbl>
      1 A         B               4
      2 A         C               7
      

      类似于我之前的answers之一

      【讨论】:

        【解决方案3】:

        如果你先把你的数据整理成一个整齐、长的形式,那么它会变得相当简单。转换为长,对你的列标签进行排序,独立于你的values,分组,总结你的val

        df %>%
            gather(grp,col,-val) %>%
            mutate(col=col[order(col,grp)]) %>%
            spread(grp,col) %>%
            group_by(col_a, col_b) %>%
            summarize(val = sum(val))
        
        ## A tibble: 2 x 3
        ## Groups:   col_a [?]
        #  col_a col_b   val
        #  <chr> <chr> <dbl>
        #1 A     B         4
        #2 A     C         7
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 2019-12-18
          • 1970-01-01
          • 2020-05-23
          • 2016-12-09
          • 2023-03-24
          • 2019-02-28
          • 2018-11-24
          • 2021-04-03
          相关资源
          最近更新 更多