【发布时间】:2021-01-11 00:44:45
【问题描述】:
我已经尝试了解决方案 Arranging rows in custom order using dplyr 但仍然无法为我的数据框弄清楚。
我有一个数据框 USA_tech,其中包含 62,000 个条目,如下所示:
region supplysector subsector technology year coefficient tech_change
AK agriculture agriculture agriculture 1975 .01 NA
AL agriculture agriculture agriculture 1975 .22 NA
AR agriculture agriculture agriculture 1975 .04 NA
AZ agriculture agriculture agriculture 1975 .09 NA
AK construction construction construction 1975 .14 NA
AL construction construction construction 1975 .30 NA
AR construction construction construction 1975 .07 NA
AZ construction construction construction 1975 .06 NA
数据框有year 1975 - 2100,通常以 5 年为增量。数据框目前按年份升序排列,50 个州按升序排列,供应部门/子部门/技术全部组合在一起。
我希望所有状态都彼此相邻(所有 AK 条目在顶部,WY 在底部)、供应部门/子部门/技术放在一起,以及改变的年份行(一个 1975 条目,然后一个 1990 条目,等等,一直到 2100),所以它看起来像这样:
region supplysector subsector technology year coefficient tech_change
AK agriculture agriculture agriculture 1975 .01 NA
AK agriculture agriculture agriculture 1990 .12 NA
AK agriculture agriculture agriculture 2005 .05 NA
AK agriculture agriculture agriculture 2010 .34 NA
AK agriculture agriculture agriculture 2015 NA .3
AK agriculture agriculture agriculture 2020 NA .2
AK agriculture agriculture agriculture 2025 NA .1
顺序很重要,因为上述年份/行的系数将用于计算该州、供应部门、子部门、技术组合的下一年的价值。
这是我尝试过的:
USA_tech_change_arranged <- USA_tech %>%
arrange( match( year, c( 1975, 1990, 2005, 2010, 2015, 2020, 2025, 2030, 2035, 2040, 2045,
2050, 2055, 2060, 2065, 2070, 2075, 2080, 2085, 2090, 2095, 2100 ) ), region, supplysector )
这有点成功,但是年份顺序没有按照我的需要应用。
谢谢!
数据:
> dput(USA_tech)
structure(list(region = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L), .Label = c("AK", "AL", "AR", "AZ", "CA", "CO", "CT", "DC",
"DE", "FL", "GA", "HI", "IA", "ID", "IL", "IN", "KS", "KY", "LA",
"MA", "MD", "ME", "MI", "MN", "MO", "MS", "MT", "NC", "ND", "NE",
"NH", "NJ", "NM", "NV", "NY", "OH", "OK", "OR", "PA", "RI", "SC",
"SD", "TN", "TX", "UT", "VA", "VT", "WA", "WI", "WV", "WY"), class = "factor"),
supplysector = structure(c(1L, 1L, 5L, 5L, 5L, 5L, 1L, 1L,
5L, 5L, 5L, 5L, 1L, 1L, 5L, 5L, 5L, 5L, 1L, 1L, 5L, 5L, 5L,
5L), .Label = c("agriculture", "aluminum and nonferrous metals",
"cement energy processes", "chemicals", "construction", "food processing",
"iron and steel", "mining", "other manufacturing", "other nonmetallic minerals",
"pulp paper and wood"), class = "factor"), subsector = structure(c(1L,
1L, 4L, 4L, 5L, 5L, 1L, 1L, 4L, 4L, 5L, 5L, 1L, 1L, 4L, 4L,
5L, 5L, 1L, 1L, 4L, 4L, 5L, 5L), .Label = c("agriculture energy",
"boilers", "boilers_CHP", "construction energy", "construction feedstocks",
"electrochemical", "feedstocks", "machine drive", "mining energy",
"other uses", "process heat"), class = "factor"), technology = structure(c(1L,
1L, 4L, 4L, 5L, 5L, 1L, 1L, 4L, 4L, 5L, 5L, 1L, 1L, 4L, 4L,
5L, 5L, 1L, 1L, 4L, 4L, 5L, 5L), .Label = c("agriculture energy",
"boilers", "boilers_CHP", "construction energy", "construction feedstocks",
"electrochemical", "feedstocks", "machine drive", "mining energy",
"other uses", "process heat"), class = "factor"), year = c(1975L,
1975L, 1975L, 1975L, 1975L, 1975L, 1990L, 1990L, 1990L, 1990L,
1990L, 1990L, 2005L, 2005L, 2005L, 2005L, 2005L, 2005L, 2010L,
2010L, 2010L, 2010L, 2010L, 2010L), coefficient = c(0.00415283842675507,
0.0105087067678448, 0.00251527374007625, 0.00401004800633499,
0.00539236968879248, 0.00185602527562958, 0.00428571855936047,
0.00602247397804429, 0.00520793681510989, 0.00246830444537675,
0.00355039681492185, 0.00265090847659984, 0.005530092870379,
0.00728658128739465, 0.00796292303916165, 0.00288955401140914,
0.00282405286490722, 0.00494969254413892, 0.00515548884308403,
0.00562261318636465, 0.00629285089491791, 0.00235150770450291,
0.00172229336981847, 0.00357616051072436), market.name = structure(c(1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("AK", "AL", "AR",
"AZ", "CA", "CO", "CT", "DC", "DE", "FL", "GA", "HI", "IA",
"ID", "IL", "IN", "KS", "KY", "LA", "MA", "MD", "ME", "MI",
"MN", "MO", "MS", "MT", "NC", "ND", "NE", "NH", "NJ", "NM",
"NV", "NY", "OH", "OK", "OR", "PA", "RI", "SC", "SD", "TN",
"TX", "UT", "VA", "VT", "WA", "WI", "WV", "WY"), class = "factor"),
tech_change = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA,
-24L), class = "data.frame")
【问题讨论】:
-
请添加重要的数据样本,以便重现您的问题。有很多方法可以做你想做的事,但没有数据是不可能帮助你的!
-
我认为你需要先按地区,然后按年份。
-
您能否提供一些示例数据(列的值不同)
dput -
@Duck 添加了来自 dput(head) 的数据,因为 dput 产生了太多
-
@starja 我将数据更新为我过滤的数据框,其中包括不同的年份和状态以获取更多变化
标签: r dplyr data-manipulation