【问题标题】:Reshape data table using multiple timevar使用多个 timevar 重塑数据表
【发布时间】:2021-09-12 18:41:01
【问题描述】:

我有一个数据集DT,其中每一行代表个人在比赛中的表现:

personID raceDate raceID finPos
person1 2009-08-14 489801 2
person1 2010-04-17 502397 6
person1 2011-03-10 524554 4
person2 2009-08-14 489801 1
person2 2011-03-10 524554 3
... ... ... ...

我想转换数据集,以便每人有 1 行,并且按照比赛日期的顺序排列他们的比赛表现(对于未参加特定比赛的个人,NA 值)。例如:

personID 489801 finPos 502397 finPos 524554 finPos
person1 2009-08-14 2 2010-04-17 6 2011-03-10 4
person2 2009-08-14 1 NA NA 2011-03-10 3

我知道我可以通过 reshape2 获得其中的一部分,例如:

reshape(DT, direction = "wide", idvar = "raceID", timevar = "raceDate")

但是有没有办法确保raceID/raceDat/finishPos 的组合保持在一起?

【问题讨论】:

    标签: r dataframe reshape tidyr reshape2


    【解决方案1】:

    这是一个整洁的方法。数据对是在一起的,但我按顺序交换了它们。

    library(tidyr)
    DT %>%
      pivot_wider(id_cols = personID, 
                  names_from = c(raceID), 
                  names_glue = "{raceID}_{.value}",
                  values_from = c(raceDate, finPos)) %>%
      select(personID, sort(colnames(.)))
    
    
    ## A tibble: 2 x 7
    #  personID `489801_finPos` `489801_raceDate` `502397_finPos` `502397_raceDate` `524554_finPos` `524554_raceDate`
    #  <chr>              <int> <chr>                       <int> <chr>                       <int> <chr>            
    #1 person1                2 2009-08-14                      6 2010-04-17                      4 2011-03-10       
    #2 person2                1 2009-08-14                     NA NA                              3 2011-03-10   
    

    【讨论】:

      【解决方案2】:

      data.table 接近

      library(data.table)
      DT <- fread("personID   raceDate    raceID  finPos
      person1     2009-08-14  489801  2
      person1     2010-04-17  502397  6
      person1     2011-03-10  524554  4
      person2     2009-08-14  489801  1
      person2     2011-03-10  524554  3")
      
      # Cast to wide
      cols <- c("raceDate", "finPos")  #value columns to cast
      answer <- dcast(DT, personID ~ raceID, value.var = cols, drop = FALSE)
      

      现在数据被转换为宽,但按值变量“分组”。您现在需要做的就是重新排列列...

      # Determine column order
      new_col_order <- CJ( sort(unique(DT$raceID)), cols, sorted = FALSE)[, paste(cols, V1, sep = "_")]
      # Set new column order
      setcolorder(answer, c(setdiff(names(answer), new_col_order), new_col_order))
      #    personID raceDate_489801 finPos_489801 raceDate_502397 finPos_502397 raceDate_524554 finPos_524554
      # 1:  person1      2009-08-14             2      2010-04-17             6      2011-03-10             4
      # 2:  person2      2009-08-14             1            <NA>            NA      2011-03-10             3
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2021-03-16
        • 2021-02-24
        • 1970-01-01
        • 1970-01-01
        • 2012-10-02
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多