在 R 中，将 Wide 数据帧转换为 Long，同时保留一些信息答案

【问题标题】：In R, converting Wide dataframe to Long while retaining some information在 R 中，将 Wide 数据帧转换为 Long，同时保留一些信息
【发布时间】：2019-09-20 11:24:32
【问题描述】：

我有一个需要转换的大型数据集，但我不知道该怎么做。假设我的研究中有 2 名参与者。

football_enjoyment <- c(5,3)
basketball_enjoyment <- c(5,5)
football_participation <- c(1,2)
basketball_participation <- c(1,3)

df<- data.frame(football_enjoyment,football_participation, 
                basketball_enjoyment,basketball_participation)
df$id <- seq.int(nrow(df))
df

##  football_enjoyment football_participation basketball_enjoyment basketball_participation id
#                  5                      1                    5                        1    1
#                  3                      2                    5                        3    2

我希望它是这样的

sports <- c("football","football", "basketball","basketball")
enjoyment_score <- c(5,3,5,5)
participation_score <- c(1,2,1,3)

id <- c(1,2)

df2 <- data.frame(sports, enjoyment_score,participation_score, id)
df2

##    sports    enjoyment_score    participation_score id
#   football               5                   1        1
#   football               3                   2        2
# basketball               5                   1        1
# basketball               5                   3        2

我被结构困住了，列/行名称仅用于演示目的。

【问题讨论】：

r - gather multiple columns in multiple key columns with tidyr的可能重复

标签： r

【解决方案1】：

使用tidyverse 你可以这样做：

library(tidyverse)
library(reshape2)

df %>% gather("variable", "value", - id) %>%
    separate(variable, into = c("sports", "variable"), sep = "_") %>%
    dcast(id + sports ~ variable) %>% arrange(desc(sports))

#  id     sports enjoyment participation
#1  1   football         5             1
#2  2   football         3             2
#3  1 basketball         5             1
#4  2 basketball         5             3

或者，在base 你可以这样做：

df2 <- reshape(df, varying = c("football_enjoyment", "football_participation", "basketball_enjoyment", "basketball_participation"), 
   direction = "long", 
   idvar = "id", 
   sep = "_", 
   timevar = "sports", 
   times = c("football", "basketball"), v.names = c('enjoyment', 'participation'))
rownames(df2) <- NULL

#  id     sports enjoyment participation
#1  1   football         5             1
#2  2   football         3             2
#3  1 basketball         5             1
#4  2 basketball         5             3

【讨论】：

【解决方案2】：

tidyr 1.0.0 有一个pivot_longer 函数可以做到这一点：

library(tidyr)

football_enjoyment <- c(5,3)
basketball_enjoyment <- c(5,5)
football_participation <- c(1,2)
basketball_participation <- c(1,3)

df<- data.frame(football_enjoyment,football_participation, 
                basketball_enjoyment,basketball_participation)
df$id <- seq.int(nrow(df))
df
#>   football_enjoyment football_participation basketball_enjoyment
#> 1                  5                      1                    5
#> 2                  3                      2                    5
#>   basketball_participation id
#> 1                        1  1
#> 2                        3  2

df %>% pivot_longer(-id, names_to = c("sports",".value"), names_sep = "_")
#> # A tibble: 4 x 4
#>      id sports     enjoyment participation
#>   <int> <chr>          <dbl>         <dbl>
#> 1     1 football           5             1
#> 2     1 basketball         5             1
#> 3     2 football           3             2
#> 4     2 basketball         5             3

^{由reprex package (v0.3.0) 于 2019 年 9 月 20 日创建}

【讨论】：