宽到长的data.frame合并两对（键/值）列答案

【问题标题】：Wide to long data.frame merging two pairs (key/value) of columns宽到长的data.frame合并两对（键/值）列
【发布时间】：2017-10-30 23:15:00
【问题描述】：

我有这个data.frame

set.seed(28100)
label_1 <- sample(c('first_col','second_col'), 10, replace = T)
dat <- data.frame(label_1,
                  value_1 = sample(1:100, 10, replace = T),
                  label_2 = sapply(label_1, FUN = function(x) ifelse(x == 'first_col', 'second_col', 'first_col')),
                  value_2 = sample(1:100, 10, replace = T))

head(dat)
         label_1 value_1    label_2 value_2
1  first_col      88 second_col      84
2  first_col      40 second_col      30
3  first_col      98 second_col      32
4 second_col      80  first_col      64
5  first_col      34 second_col      43
6 second_col      52  first_col      10

两对键/值列的顺序不一致。我想把同样的数据reshape成一个长格式的data.frame，比如：

desired_dat <- data.frame(first_col = rep(NA, 10), 
                          second_col = rep(NA, 10))

是否建议使用reshape2 或tidyr 来解决这个问题？具体如何？

【问题讨论】：

标签： r tidyr reshape2

【解决方案1】：

只使用dplyr 怎么样（不需要tidyr 等）？

library(dplyr)
dat %>% transmute(first_col = if_else(label_1 == "first_col", value_1, value_2),
                  second_col = if_else(label_2 == "second_col", value_2, value_1))

#>    first_col second_col
#> 1         88         84
#> 2         40         30
#> 3         98         32
#> 4         64         80
#> 5         34         43
#> 6         10         52
#> 7         23         85
#> 8         65         86
#> 9          4         35
#> 10        83          8

【讨论】：

【解决方案2】：

这基本上是@SymbolixAU的解决方案，只是翻译成dplyr：

# Create an ID for each row: probably not necessary but useful to check
dat <- dat %>%
    mutate(id = row_number())

dat_long <- bind_rows(
    dat %>% select(id, label = label_1, value = value_1),
    dat %>% select(id, label = label_2, value = value_2)
)

output <- dat_long %>%
    spread(label, value)

【讨论】：

【解决方案3】：

我会使用data.table 来做这件事，尽管同样的原则也可以应用于tidyverse

library(data.table)

## Setting as a data.table, and adding an 'id' value to keep track of rows
setDT(dat)
dat[, id := .I]


## then 'rbinding' the _1 and _2 columns together, with common column names
dat2 <- rbindlist(
    list(
        dat[, .(id, label = label_1, value = value_1)], 
        dat[, .(id, label = label_2, value = value_2)]
        )
)

## the reshaping from long to wide to give you your desired result
dcast(dat2, formula = id ~ label)
#     id first_col second_col
# 1:   1        88         84
# 2:   2        40         30
# 3:   3        98         32
# 4:   4        64         80
# 5:   5        34         43
# 6:   6        10         52
# 7:   7        23         85
# 8:   8        65         86
# 9:   9         4         35
# 10: 10        83          8

【讨论】：

【解决方案4】：

从 v1.9.6 版开始（CRAN 2015 年 9 月 19 日），data.table 可以同时melt() 多列。所以这出现在data.table 表达式链中：

library(data.table)
as.data.table(dat)[, rn := .I][
  , melt(.SD, measure.vars = patterns("label", "value"))][
    , dcast(.SD, rn ~ value1)][, -"rn"]

    first_col second_col
 1:        88         84
 2:        40         30
 3:        98         32
 4:        64         80
 5:        34         43
 6:        10         52
 7:        23         85
 8:        65         86
 9:         4         35
10:        83          8

【讨论】：

【解决方案5】：

这是一个可能的解决方案；但不是最优雅的。

myFun <- function(label1, value1, label2, value2, which_label) {
  return(ifelse(label1 == which_label, value1, value2))
}

desired_dat <- 
  data.frame(first_col = mapply(FUN = myFun, dat$label_1, dat$value_1, dat$label_2, dat$value_2, MoreArgs = list(which_label = 'first_col'), SIMPLIFY = TRUE), 
             second_col = mapply(FUN = myFun, dat$label_1, dat$value_1, dat$label_2, dat$value_2, MoreArgs = list(which_label = 'second_col'), SIMPLIFY = TRUE))

head(desired_dat)


first_col second_col
1        88         84
2        40         30
3        98         32
4        64         80
5        34         43
6        10         52

【讨论】：