【问题标题】:Transpose every N rows to new column将每 N 行转置到新列
【发布时间】:2019-01-22 04:42:24
【问题描述】:

我有一个 df,在一列中有数百行,遵循这种模式:

   col1
1.12/17/18
2.10/10
3.Best Movie
4.This is the best movie ever...
5.
6.
7.1/1/2019
8.02/10
9.Worst Movie
10.This movie was awful...

我想知道是否有办法将 4 行范围内的每一行转置到它们自己的列中,然后将下一个 4 范围堆叠在新列的下方?

所以最终的输出是这样的:

Date         Rating     Title       Review
12/17/18      10/10     Best Movie  This is the best movie ever...
1/1/2019      02/10     Worst Movie This movie was awful...

关于如何更改 df 以实现这一点的任何建议?

【问题讨论】:

标签: r dataframe transpose


【解决方案1】:

这基本上是一个从长到宽的转换,但您需要创建一个键列(将成为列名)和一个 ID 列,以便清楚哪些值进入哪些行。在tidyverse语法中,

library(tidyverse)

df <- data.frame(
    col1 = c("12/17/18", "10/10", "Best Movie", "This is the best movie ever...", "", "", "1/1/2019", "02/10", "Worst Movie", "This movie was awful..."), 
    stringsAsFactors = FALSE
)

df %>% 
    filter(col1 != '') %>%    # drop empty rows
    mutate(key = rep(c('Date', 'Rating', 'Title', 'Review'), n() / 4), 
           id = cumsum(key == 'Date')) %>% 
    spread(key, col1)
#>   id     Date Rating                         Review       Title
#> 1  1 12/17/18  10/10 This is the best movie ever...  Best Movie
#> 2  2 1/1/2019  02/10        This movie was awful... Worst Movie

不过,这种数据结构确实很脆弱;任何偏差都可以把它抛诸脑后。更好的解决方案是在数据结构陷入混乱之前维护上游的数据结构。

【讨论】:

    【解决方案2】:

    如果每条记录的列数相同,我会先将其包装成matrix。使用@alistaire 的数据:

    out <- as.data.frame(matrix(df$col1[df$col1!=""], ncol=4, byrow=TRUE))
    names(out) <- c('Date', 'Rating', 'Title', 'Review')
    out
    #      Date Rating       Title                         Review
    #1 12/17/18  10/10  Best Movie This is the best movie ever...
    #2 1/1/2019  02/10 Worst Movie        This movie was awful...
    

    或者甚至使用scanmulti.line=TRUE 参数一次性将它们组合在一起:

    out <- data.frame(scan(text=df$col1[df$col1 != ""], multi.line=TRUE, what=rep(list(""), 4), sep="\n"))
    names(out) <- c('Date', 'Rating', 'Title', 'Review')
    out
    #      Date Rating       Title                         Review
    #1 12/17/18  10/10  Best Movie This is the best movie ever...
    #2 1/1/2019  02/10 Worst Movie        This movie was awful...
    

    scan 的好处是您还可以在what= 参数中指定输出格式。所以如果第 2 列是一个整数,你可以这样做:

    scan(file, multi.line=TRUE, what=list("",1L,"",""), sep="\n")
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2013-01-09
      • 2022-06-16
      相关资源
      最近更新 更多