【问题标题】:Transposing a data-frame by groups in R with missing values在 R 中按缺失值组转置数据帧
【发布时间】:2020-03-08 20:21:42
【问题描述】:

我有一个看起来像这样的数据框


Country    Variable      2012     2013    2014
Germany    Medical       11       2       4
Germany    Transport     12       6       8
France     Medical       15       10      12
France     Transport     17       13      14  
France     Food          24       14      15

我想以这样一种方式转置数据帧,使最终的数据帧采用以下形式:

Country     year    Medical    Transport     Food 
Germany     2012    11         12            NA
Germany     2013    2          6             NA
Germany     2014    4          8             NA
France      2012    15         17            24
France      2013    10         13            14  
France      2014    12         14            15

我尝试了几个函数,包括 meltreshapespread,但它们都不起作用。有人有什么想法吗?

【问题讨论】:

  • gather 然后spread

标签: r reshape transpose spread


【解决方案1】:

我们也可以使用data.table中的transpose

library(data.table) # v >= 1.12.4 
rbindlist(lapply(split(df1[-1], df1$Country), function(x) 
   data.table::transpose(x, keep.names = 'year', make.names = "Variable")), 
      idcol = 'Country', fill = TRUE)
#   Country year Medical Transport Food
#1:  France 2012      15        17   24
#2:  France 2013      10        13   14
#3:  France 2014      12        14   15
#4: Germany 2012      11        12   NA
#5: Germany 2013       2         6   NA
#6: Germany 2014       4         8   NA

数据

df1 <- structure(list(Country = c("Germany", "Germany", "France", "France", 
"France"), Variable = c("Medical", "Transport", "Medical", "Transport", 
"Food"), `2012` = c(11L, 12L, 15L, 17L, 24L), `2013` = c(2L, 
6L, 10L, 13L, 14L), `2014` = c(4L, 8L, 12L, 14L, 15L)), 
 class = "data.frame", row.names = c(NA, 
-5L))

【讨论】:

    【解决方案2】:

    你可以先转成长格式再转成宽格式

    library(tidyr)
    
    df %>%
      pivot_longer(cols = -c(Country, Variable), names_to = "year") %>%
      pivot_wider(names_from = Variable, values_from = value)
    
    # A tibble: 6 x 5
    #  Country year  Medical Transport  Food
    #  <fct>   <chr>   <int>     <int> <int>
    #1 Germany 2012       11        12    NA
    #2 Germany 2013        2         6    NA
    #3 Germany 2014        4         8    NA
    #4 France  2012       15        17    24
    #5 France  2013       10        13    14
    #6 France  2014       12        14    15
    

    对于旧版本的 tidyr,将使用 gatherspread

    df %>%
      gather(year, value, -c(Country, Variable)) %>%
      spread(Variable, value)
    

    【讨论】:

      猜你喜欢
      • 2020-11-14
      • 2013-06-14
      • 2011-08-17
      • 1970-01-01
      • 2013-06-27
      • 1970-01-01
      • 2020-08-28
      • 2019-03-02
      • 1970-01-01
      相关资源
      最近更新 更多