【问题标题】:For each ID, separate groups into columns and collapse multiple value strings in R对于每个 ID,将组分成列并折叠 R 中的多个值字符串
【发布时间】:2020-04-28 21:00:02
【问题描述】:

我有一个如下所示的数据框:

in.dat <- data.frame(ID = c("A1", "A1", "A1", "A1", "B1", "B1", "B1", "B1"),
           DB = rep(c("bio", "bio", "func", "loc"), 2),
           val = c("IPR1", "IPR2", "s43", "333-456", 
                   "IPR7", "IPR8", "q87", "566-900"))

  ID   DB     val
1 A1  bio    IPR1
2 A1  bio    IPR2
3 A1 func     s43
4 A1  loc 333-456
5 B1  bio    IPR7
6 B1  bio    IPR8
7 B1 func     q87
8 B1  loc 566-900

我想把“DB”变成列,取字符串值并用“;”折叠

out.dat <- data.frame(ID = c("A1", "B1"),
                  bio = c("IPR1;IPR2", "IPR7;IPR8"),
                  func = c("s47", "q87"),
                  loc = c("333-456", "566-900"))

> out
  ID       bio func     loc
1 A1 IPR1;IPR2  s47 333-456
2 B1 IPR7;IPR8  q87 566-900

我已经使用dplyr 玩过pivot_widergroup,但并没有完全得到我想要的,因为一个组可以有多个值,每个ID 我想折叠到一个单元格中(例如,“IPR1 ;IPR2")

任何解决方案将不胜感激!

【问题讨论】:

    标签: r dplyr tidyverse


    【解决方案1】:

    pivot_wider 在最近的tidyr 版本中使用参数values_fn 用于在整形之前聚合值的函数。这使您可以在一个函数调用中进行操作。

    library(tidyr)
    
    in.dat %>%
      pivot_wider(names_from = DB, values_from = val, 
                  values_fn = list(val = ~paste(., collapse = ";")))
    #> # A tibble: 2 x 4
    #>   ID    bio       func  loc    
    #>   <fct> <chr>     <chr> <chr>  
    #> 1 A1    IPR1;IPR2 s43   333-456
    #> 2 B1    IPR7;IPR8 q87   566-900
    

    【讨论】:

      【解决方案2】:

      我们可以通过IDDB 折叠val,然后使用pivot_wider

      library(dplyr)
      
      in.dat %>%
        group_by(ID, DB) %>%
        summarise(val = paste0(val, collapse = ";")) %>%
        tidyr::pivot_wider(names_from = DB, values_from = val)
      
      #  ID    bio       func  loc    
      #  <fct> <chr>     <chr> <chr>  
      #1 A1    IPR1;IPR2 s43   333-456
      #2 B1    IPR7;IPR8 q87   566-900
      

      【讨论】:

        【解决方案3】:

        您可以使用dcast 来执行此操作。

        in.dat <- data.frame(ID = c("A1", "A1", "A1", "A1", "B1", "B1", "B1", "B1"),
                             DB = rep(c("bio", "bio", "func", "loc"), 2),
                             val = c("IPR1", "IPR2", "s43", "333-456", 
                                     "IPR7", "IPR8", "q87", "566-900"))
        
        library(reshape2)
        dcast(in.dat, ID ~ DB, paste0, collapse = ";")
        #  ID       bio func     loc
        #1 A1 IPR1;IPR2  s43 333-456
        #2 B1 IPR7;IPR8  q87 566-900
        

        【讨论】:

          【解决方案4】:

          我们也可以使用spreadstr_c

          library(dplyr)
          library(tidyr)
          library(stringr)
          in.dat %>% 
             group_by(ID, DB) %>% 
             summarise(val = str_c(val, collapse=";")) %>% 
             spread(DB, val)
          # A tibble: 2 x 4
          # Groups:   ID [2]
          #   ID    bio       func  loc    
          #   <fct> <chr>     <chr> <chr>  
          #1 A1    IPR1;IPR2 s43   333-456
          #2 B1    IPR7;IPR8 q87   566-900
          

          【讨论】:

            猜你喜欢
            • 2012-03-08
            • 1970-01-01
            • 2010-10-19
            • 2017-11-30
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            • 2013-04-02
            • 2018-12-08
            相关资源
            最近更新 更多