【问题标题】:In R, converting Wide dataframe to Long while retaining some information在 R 中,将 Wide 数据帧转换为 Long,同时保留一些信息
【发布时间】:2019-09-20 11:24:32
【问题描述】:

我有一个需要转换的大型数据集,但我不知道该怎么做。 假设我的研究中有 2 名参与者。

football_enjoyment <- c(5,3)
basketball_enjoyment <- c(5,5)
football_participation <- c(1,2)
basketball_participation <- c(1,3)

df<- data.frame(football_enjoyment,football_participation, 
                basketball_enjoyment,basketball_participation)
df$id <- seq.int(nrow(df))
df

##  football_enjoyment football_participation basketball_enjoyment basketball_participation id
#                  5                      1                    5                        1    1
#                  3                      2                    5                        3    2

我希望它是这样的

sports <- c("football","football", "basketball","basketball")
enjoyment_score <- c(5,3,5,5)
participation_score <- c(1,2,1,3)

id <- c(1,2)

df2 <- data.frame(sports, enjoyment_score,participation_score, id)
df2

##    sports    enjoyment_score    participation_score id
#   football               5                   1        1
#   football               3                   2        2
# basketball               5                   1        1
# basketball               5                   3        2

我被结构困住了,列/行名称仅用于演示目的。

【问题讨论】:

标签: r


【解决方案1】:

使用tidyverse 你可以这样做:

library(tidyverse)
library(reshape2)

df %>% gather("variable", "value", - id) %>%
    separate(variable, into = c("sports", "variable"), sep = "_") %>%
    dcast(id + sports ~ variable) %>% arrange(desc(sports))

#  id     sports enjoyment participation
#1  1   football         5             1
#2  2   football         3             2
#3  1 basketball         5             1
#4  2 basketball         5             3

或者,在base 你可以这样做:

df2 <- reshape(df, varying = c("football_enjoyment", "football_participation", "basketball_enjoyment", "basketball_participation"), 
   direction = "long", 
   idvar = "id", 
   sep = "_", 
   timevar = "sports", 
   times = c("football", "basketball"), v.names = c('enjoyment', 'participation'))
rownames(df2) <- NULL

#  id     sports enjoyment participation
#1  1   football         5             1
#2  2   football         3             2
#3  1 basketball         5             1
#4  2 basketball         5             3

【讨论】:

    【解决方案2】:

    tidyr 1.0.0 有一个pivot_longer 函数可以做到这一点:

    library(tidyr)
    
    football_enjoyment <- c(5,3)
    basketball_enjoyment <- c(5,5)
    football_participation <- c(1,2)
    basketball_participation <- c(1,3)
    
    df<- data.frame(football_enjoyment,football_participation, 
                    basketball_enjoyment,basketball_participation)
    df$id <- seq.int(nrow(df))
    df
    #>   football_enjoyment football_participation basketball_enjoyment
    #> 1                  5                      1                    5
    #> 2                  3                      2                    5
    #>   basketball_participation id
    #> 1                        1  1
    #> 2                        3  2
    
    df %>% pivot_longer(-id, names_to = c("sports",".value"), names_sep = "_")
    #> # A tibble: 4 x 4
    #>      id sports     enjoyment participation
    #>   <int> <chr>          <dbl>         <dbl>
    #> 1     1 football           5             1
    #> 2     1 basketball         5             1
    #> 3     2 football           3             2
    #> 4     2 basketball         5             3
    

    reprex package (v0.3.0) 于 2019 年 9 月 20 日创建

    【讨论】:

      猜你喜欢
      • 2017-05-02
      • 1970-01-01
      • 1970-01-01
      • 2023-02-25
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2014-12-11
      • 2020-03-11
      相关资源
      最近更新 更多