【问题标题】:Wide to long data.frame merging two pairs (key/value) of columns宽到长的data.frame合并两对(键/值)列
【发布时间】:2017-10-30 23:15:00
【问题描述】:

我有这个data.frame

set.seed(28100)
label_1 <- sample(c('first_col','second_col'), 10, replace = T)
dat <- data.frame(label_1,
                  value_1 = sample(1:100, 10, replace = T),
                  label_2 = sapply(label_1, FUN = function(x) ifelse(x == 'first_col', 'second_col', 'first_col')),
                  value_2 = sample(1:100, 10, replace = T))

head(dat)
         label_1 value_1    label_2 value_2
1  first_col      88 second_col      84
2  first_col      40 second_col      30
3  first_col      98 second_col      32
4 second_col      80  first_col      64
5  first_col      34 second_col      43
6 second_col      52  first_col      10

两对键/值列的顺序不一致。我想把同样的数据reshape成一个长格式的data.frame,比如:

desired_dat <- data.frame(first_col = rep(NA, 10), 
                          second_col = rep(NA, 10))

是否建议使用reshape2tidyr 来解决这个问题?具体如何?

【问题讨论】:

    标签: r tidyr reshape2


    【解决方案1】:

    只使用dplyr 怎么样(不需要tidyr 等)?

    library(dplyr)
    dat %>% transmute(first_col = if_else(label_1 == "first_col", value_1, value_2),
                      second_col = if_else(label_2 == "second_col", value_2, value_1))
    
    #>    first_col second_col
    #> 1         88         84
    #> 2         40         30
    #> 3         98         32
    #> 4         64         80
    #> 5         34         43
    #> 6         10         52
    #> 7         23         85
    #> 8         65         86
    #> 9          4         35
    #> 10        83          8
    

    【讨论】:

      【解决方案2】:

      这基本上是@SymbolixAU的解决方案,只是翻译成dplyr

      # Create an ID for each row: probably not necessary but useful to check
      dat <- dat %>%
          mutate(id = row_number())
      
      dat_long <- bind_rows(
          dat %>% select(id, label = label_1, value = value_1),
          dat %>% select(id, label = label_2, value = value_2)
      )
      
      output <- dat_long %>%
          spread(label, value)
      

      【讨论】:

        【解决方案3】:

        我会使用data.table 来做这件事,尽管同样的原则也可以应用于tidyverse

        library(data.table)
        
        ## Setting as a data.table, and adding an 'id' value to keep track of rows
        setDT(dat)
        dat[, id := .I]
        
        
        ## then 'rbinding' the _1 and _2 columns together, with common column names
        dat2 <- rbindlist(
            list(
                dat[, .(id, label = label_1, value = value_1)], 
                dat[, .(id, label = label_2, value = value_2)]
                )
        )
        
        ## the reshaping from long to wide to give you your desired result
        dcast(dat2, formula = id ~ label)
        #     id first_col second_col
        # 1:   1        88         84
        # 2:   2        40         30
        # 3:   3        98         32
        # 4:   4        64         80
        # 5:   5        34         43
        # 6:   6        10         52
        # 7:   7        23         85
        # 8:   8        65         86
        # 9:   9         4         35
        # 10: 10        83          8
        

        【讨论】:

          【解决方案4】:

          从 v1.9.6 版开始(CRAN 2015 年 9 月 19 日),data.table 可以同时melt() 多列。所以这出现在data.table 表达式链中:

          library(data.table)
          as.data.table(dat)[, rn := .I][
            , melt(.SD, measure.vars = patterns("label", "value"))][
              , dcast(.SD, rn ~ value1)][, -"rn"]
          
              first_col second_col
           1:        88         84
           2:        40         30
           3:        98         32
           4:        64         80
           5:        34         43
           6:        10         52
           7:        23         85
           8:        65         86
           9:         4         35
          10:        83          8
          

          【讨论】:

            【解决方案5】:

            这是一个可能的解决方案;但不是最优雅的。

            myFun <- function(label1, value1, label2, value2, which_label) {
              return(ifelse(label1 == which_label, value1, value2))
            }
            
            desired_dat <- 
              data.frame(first_col = mapply(FUN = myFun, dat$label_1, dat$value_1, dat$label_2, dat$value_2, MoreArgs = list(which_label = 'first_col'), SIMPLIFY = TRUE), 
                         second_col = mapply(FUN = myFun, dat$label_1, dat$value_1, dat$label_2, dat$value_2, MoreArgs = list(which_label = 'second_col'), SIMPLIFY = TRUE))
            
            head(desired_dat)
            
            
            first_col second_col
            1        88         84
            2        40         30
            3        98         32
            4        64         80
            5        34         43
            6        10         52
            

            【讨论】:

              猜你喜欢
              • 1970-01-01
              • 1970-01-01
              • 1970-01-01
              • 1970-01-01
              • 2018-06-05
              • 1970-01-01
              • 2018-11-06
              • 2017-03-23
              • 1970-01-01
              相关资源
              最近更新 更多