【问题标题】:Formatting data (transposing) [duplicate]格式化数据(转置)[重复]
【发布时间】:2021-12-23 18:50:23
【问题描述】:

我是 R 环境的新手,我想提高我的“操作数据”技能。我有这个数据集:

    structure(list(taxon = c("Acroloxus lacustris", "Ancylus fluviatilis", 
"Ancylus fluviatilis", "Asellus aquaticus", "Asellus aquaticus", 
"Asellus aquaticus", "Bithynia tentaculata", "Bryozoa Gen. sp.", 
"Chironomidae Gen. sp.", "Ephydatia fluviatilis", "Erpobdella octoculata", 
"Erpobdella octoculata", "Erpobdella octoculata", "Glossiphonia complanata", 
"Physella acuta", "Plumatella fungosa", "Plumatella fungosa", 
"Plumatella repens", "Radix balthica/labiata", "Radix balthica/labiata", 
"Radix balthica/labiata", "Sphaerium corneum", "Spongilla lacustris", 
"Spongillidae Gen. sp.", "Tubificidae Gen. sp."), year = c(1971, 
1969, 1971, 1968, 1969, 1971, 1971, 1971, 1969, 1971, 1968, 1969, 
1971, 1971, 1971, 1968, 1971, 1971, 1968, 1969, 1971, 1971, 1969, 
1971, 1971), abundance = c(12.5714285714286, 6, 15.5, 1, 13, 
100.333333333333, 2.11111111111111, 13, 6, 7, 20, 42.5, 22.875, 
1, 1, 20, 3.5, 2.66666666666667, 20, 42.5, 17.5789473684211, 
65, 6, 42.5, 1)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -25L), groups = structure(list(taxon = c("Acroloxus lacustris", 
"Ancylus fluviatilis", "Asellus aquaticus", "Bithynia tentaculata", 
"Bryozoa Gen. sp.", "Chironomidae Gen. sp.", "Ephydatia fluviatilis", 
"Erpobdella octoculata", "Glossiphonia complanata", "Physella acuta", 
"Plumatella fungosa", "Plumatella repens", "Radix balthica/labiata", 
"Sphaerium corneum", "Spongilla lacustris", "Spongillidae Gen. sp.", 
"Tubificidae Gen. sp."), .rows = structure(list(1L, 2:3, 4:6, 
    7L, 8L, 9L, 10L, 11:13, 14L, 15L, 16:17, 18L, 19:21, 22L, 
    23L, 24L, 25L), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -17L), .drop = TRUE))

我想将此数据集重新排序为这种格式:

列:物种 1、物种 2、物种 3 .... 行:年(1968、1969 和 1971,即 3 行) 如您所见,并非所有物种都出现在所有年份 看起来像下一个:

          Sp1     Sp2...
1968       1       0
1969       0       45
1971       10      0

但我不知道如何重新排序。我认为%>% t()mutate() 可能有用... 感谢您的宝贵时间。

【问题讨论】:

  • 我加载了这个,将其分配给一个变量,并将其转换为数据帧而没有错误。您应该查看tidyr 包,特别是spreadgatherseparateunite 函数。这是一个很好的介绍:rstudio-education.github.io/tidyverse-cookbook/tidy.html
  • @LSM-DAT_Linux 但是请注意,pivot_widerpivot_longer 现在是 spreadgather 的首选选项(已被取代)。
  • @Andrew Gillreath-Brown 如果我给它们起别名,则不会;)我更喜欢传播,聚集为动词。 pivot_wider、pivot_longer 听起来像是由 Excel 阴谋集团命名的。

标签: r dplyr


【解决方案1】:

类似以下内容?

library(tidyverse)

df %>% 
  pivot_wider(id_cols = year, names_from = taxon, values_from = abundance) %>%
  arrange(year)

#> # A tibble: 3 × 18
#>    year `Acroloxus lacustris` `Ancylus fluvia… `Asellus aquati… `Bithynia tenta…
#>   <dbl>                 <dbl>            <dbl>            <dbl>            <dbl>
#> 1  1968                  NA               NA                 1             NA   
#> 2  1969                  NA                6                13             NA   
#> 3  1971                  12.6             15.5             100.             2.11
#> # … with 13 more variables: Bryozoa Gen. sp. <dbl>,
#> #   Chironomidae Gen. sp. <dbl>, Ephydatia fluviatilis <dbl>,
#> #   Erpobdella octoculata <dbl>, Glossiphonia complanata <dbl>,
#> #   Physella acuta <dbl>, Plumatella fungosa <dbl>, Plumatella repens <dbl>,
#> #   Radix balthica/labiata <dbl>, Sphaerium corneum <dbl>,
#> #   Spongilla lacustris <dbl>, Spongillidae Gen. sp. <dbl>,
#> #   Tubificidae Gen. sp. <dbl>

【讨论】:

    【解决方案2】:

    data.table 选项

    library(data.table)
    
    dcast(setDT(df), year~taxon, value.var='abundance')
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2013-11-27
      • 1970-01-01
      • 2022-01-22
      • 2018-07-23
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多