【问题标题】:Arranging rows in R so that year column is in a custom order, and other columns with identical entries are grouped在 R 中排列行,以使年份列按自定义顺序排列,并将具有相同条目的其他列分组
【发布时间】:2021-01-11 00:44:45
【问题描述】:

我已经尝试了解决方案 Arranging rows in custom order using dplyr 但仍然无法为我的数据框弄清楚。

我有一个数据框 USA_tech,其中包含 62,000 个条目,如下所示:

region     supplysector    subsector    technology    year    coefficient   tech_change
AK          agriculture   agriculture   agriculture   1975      .01             NA  
AL          agriculture   agriculture   agriculture   1975      .22             NA
AR          agriculture   agriculture   agriculture   1975      .04             NA
AZ          agriculture   agriculture   agriculture   1975      .09             NA
AK          construction  construction  construction  1975      .14             NA
AL          construction  construction  construction  1975      .30             NA
AR          construction  construction  construction  1975      .07             NA
AZ          construction  construction  construction  1975      .06             NA

数据框有year 1975 - 2100,通常以 5 年为增量。数据框目前按年份升序排列,50 个州按升序排列,供应部门/子部门/技术全部组合在一起。

我希望所有状态都彼此相邻(所有 AK 条目在顶部,WY 在底部)、供应部门/子部门/技术放在一起,以及改变的年份行(一个 1975 条目,然后一个 1990 条目,等等,一直到 2100),所以它看起来像这样:

region     supplysector    subsector    technology    year    coefficient   tech_change
AK          agriculture   agriculture   agriculture   1975      .01             NA  
AK          agriculture   agriculture   agriculture   1990      .12             NA  
AK          agriculture   agriculture   agriculture   2005      .05             NA  
AK          agriculture   agriculture   agriculture   2010      .34             NA  
AK          agriculture   agriculture   agriculture   2015       NA             .3  
AK          agriculture   agriculture   agriculture   2020       NA             .2  
AK          agriculture   agriculture   agriculture   2025       NA             .1  

顺序很重要,因为上述年份/行的系数将用于计算该州、供应部门、子部门、技术组合的下一年的价值。

这是我尝试过的:

USA_tech_change_arranged <- USA_tech %>%
  arrange( match( year, c( 1975, 1990, 2005, 2010, 2015, 2020, 2025, 2030, 2035, 2040, 2045,
                           2050, 2055, 2060, 2065, 2070, 2075, 2080, 2085, 2090, 2095, 2100 ) ), region, supplysector )

这有点成功,但是年份顺序没有按照我的需要应用。

谢谢!

数据:

> dput(USA_tech)
structure(list(region = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L), .Label = c("AK", "AL", "AR", "AZ", "CA", "CO", "CT", "DC", 
"DE", "FL", "GA", "HI", "IA", "ID", "IL", "IN", "KS", "KY", "LA", 
"MA", "MD", "ME", "MI", "MN", "MO", "MS", "MT", "NC", "ND", "NE", 
"NH", "NJ", "NM", "NV", "NY", "OH", "OK", "OR", "PA", "RI", "SC", 
"SD", "TN", "TX", "UT", "VA", "VT", "WA", "WI", "WV", "WY"), class = "factor"), 
    supplysector = structure(c(1L, 1L, 5L, 5L, 5L, 5L, 1L, 1L, 
    5L, 5L, 5L, 5L, 1L, 1L, 5L, 5L, 5L, 5L, 1L, 1L, 5L, 5L, 5L, 
    5L), .Label = c("agriculture", "aluminum and nonferrous metals", 
    "cement energy processes", "chemicals", "construction", "food processing", 
    "iron and steel", "mining", "other manufacturing", "other nonmetallic minerals", 
    "pulp paper and wood"), class = "factor"), subsector = structure(c(1L, 
    1L, 4L, 4L, 5L, 5L, 1L, 1L, 4L, 4L, 5L, 5L, 1L, 1L, 4L, 4L, 
    5L, 5L, 1L, 1L, 4L, 4L, 5L, 5L), .Label = c("agriculture energy", 
    "boilers", "boilers_CHP", "construction energy", "construction feedstocks", 
    "electrochemical", "feedstocks", "machine drive", "mining energy", 
    "other uses", "process heat"), class = "factor"), technology = structure(c(1L, 
    1L, 4L, 4L, 5L, 5L, 1L, 1L, 4L, 4L, 5L, 5L, 1L, 1L, 4L, 4L, 
    5L, 5L, 1L, 1L, 4L, 4L, 5L, 5L), .Label = c("agriculture energy", 
    "boilers", "boilers_CHP", "construction energy", "construction feedstocks", 
    "electrochemical", "feedstocks", "machine drive", "mining energy", 
    "other uses", "process heat"), class = "factor"), year = c(1975L, 
    1975L, 1975L, 1975L, 1975L, 1975L, 1990L, 1990L, 1990L, 1990L, 
    1990L, 1990L, 2005L, 2005L, 2005L, 2005L, 2005L, 2005L, 2010L, 
    2010L, 2010L, 2010L, 2010L, 2010L), coefficient = c(0.00415283842675507, 
    0.0105087067678448, 0.00251527374007625, 0.00401004800633499, 
    0.00539236968879248, 0.00185602527562958, 0.00428571855936047, 
    0.00602247397804429, 0.00520793681510989, 0.00246830444537675, 
    0.00355039681492185, 0.00265090847659984, 0.005530092870379, 
    0.00728658128739465, 0.00796292303916165, 0.00288955401140914, 
    0.00282405286490722, 0.00494969254413892, 0.00515548884308403, 
    0.00562261318636465, 0.00629285089491791, 0.00235150770450291, 
    0.00172229336981847, 0.00357616051072436), market.name = structure(c(1L, 
    2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
    1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("AK", "AL", "AR", 
    "AZ", "CA", "CO", "CT", "DC", "DE", "FL", "GA", "HI", "IA", 
    "ID", "IL", "IN", "KS", "KY", "LA", "MA", "MD", "ME", "MI", 
    "MN", "MO", "MS", "MT", "NC", "ND", "NE", "NH", "NJ", "NM", 
    "NV", "NY", "OH", "OK", "OR", "PA", "RI", "SC", "SD", "TN", 
    "TX", "UT", "VA", "VT", "WA", "WI", "WV", "WY"), class = "factor"), 
    tech_change = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA, 
-24L), class = "data.frame")

【问题讨论】:

  • 请添加重要的数据样本,以便重现您的问题。有很多方法可以做你想做的事,但没有数据是不可能帮助你的!
  • 我认为你需要先按地区,然后按年份。
  • 您能否提供一些示例数据(列的值不同)dput
  • @Duck 添加了来自 dput(head) 的数据,因为 dput 产生了太多
  • @starja 我将数据更新为我过滤的数据框,其中包括不同的年份和状态以获取更多变化

标签: r dplyr data-manipulation


【解决方案1】:

这是你想要的吗?

library(dplyr)

USA_tech %>% 
  arrange(region, supplysector, subsector, technology, year)
#>    region supplysector               subsector              technology year
#> 1      AK  agriculture      agriculture energy      agriculture energy 1975
#> 2      AK  agriculture      agriculture energy      agriculture energy 1990
#> 3      AK  agriculture      agriculture energy      agriculture energy 2005
#> 4      AK  agriculture      agriculture energy      agriculture energy 2010
#> 5      AK construction     construction energy     construction energy 1975
#> 6      AK construction     construction energy     construction energy 1990
#> 7      AK construction     construction energy     construction energy 2005
#> 8      AK construction     construction energy     construction energy 2010
#> 9      AK construction construction feedstocks construction feedstocks 1975
#> 10     AK construction construction feedstocks construction feedstocks 1990
#> 11     AK construction construction feedstocks construction feedstocks 2005
#> 12     AK construction construction feedstocks construction feedstocks 2010
#> 13     AL  agriculture      agriculture energy      agriculture energy 1975
#> 14     AL  agriculture      agriculture energy      agriculture energy 1990
#> 15     AL  agriculture      agriculture energy      agriculture energy 2005
#> 16     AL  agriculture      agriculture energy      agriculture energy 2010
#> 17     AL construction     construction energy     construction energy 1975
#> 18     AL construction     construction energy     construction energy 1990
#> 19     AL construction     construction energy     construction energy 2005
#> 20     AL construction     construction energy     construction energy 2010
#> 21     AL construction construction feedstocks construction feedstocks 1975
#> 22     AL construction construction feedstocks construction feedstocks 1990
#> 23     AL construction construction feedstocks construction feedstocks 2005
#> 24     AL construction construction feedstocks construction feedstocks 2010
#>    coefficient market.name tech_change
#> 1  0.004152838          AK          NA
#> 2  0.004285719          AK          NA
#> 3  0.005530093          AK          NA
#> 4  0.005155489          AK          NA
#> 5  0.002515274          AK          NA
#> 6  0.005207937          AK          NA
#> 7  0.007962923          AK          NA
#> 8  0.006292851          AK          NA
#> 9  0.005392370          AK          NA
#> 10 0.003550397          AK          NA
#> 11 0.002824053          AK          NA
#> 12 0.001722293          AK          NA
#> 13 0.010508707          AL          NA
#> 14 0.006022474          AL          NA
#> 15 0.007286581          AL          NA
#> 16 0.005622613          AL          NA
#> 17 0.004010048          AL          NA
#> 18 0.002468304          AL          NA
#> 19 0.002889554          AL          NA
#> 20 0.002351508          AL          NA
#> 21 0.001856025          AL          NA
#> 22 0.002650908          AL          NA
#> 23 0.004949693          AL          NA
#> 24 0.003576161          AL          NA

reprex package (v0.3.0) 于 2020 年 9 月 24 日创建

使用arrange 时,数据按照您指定的列顺序排列。

【讨论】:

  • 好的,谢谢!我没有意识到顺序很重要......
猜你喜欢
  • 2011-02-03
  • 2021-07-16
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2022-08-04
  • 2018-02-18
  • 1970-01-01
相关资源
最近更新 更多