【问题标题】:Transforming multiple columns structure using Dplyr in R在 R 中使用 Dplyr 转换多列结构
【发布时间】:2020-05-10 01:13:59
【问题描述】:

我有一个数据集,df,

State         Year         0       1         2       3         4         5 

Georgia      2001         10,000   200      300      400      500        800  
Georgia      2002         20,000   500      500      1,000    2,000      2,500
Georgia      2003         2,000    5,000    1,000    400      300        8,000
Washington   2001         1,000    10,000   6,000    8,000    9,900      10,000
Washington   2006         5,000    300      200      900      1,000      8,000

我希望我想要的输出如下所示:

State          Year         Age         Population      

Georgia        2001         0           10,000
Georgia        2002         0           20,000
Georgia        2003         0           2,000
Georgia        2001         1           200
Georgia        2002         1           500
Georgia        2003         1           5000
Georgia        2001         2           300
Georgia        2002         2           500
Georgia        2003         2           1000
Georgia        2001         3           400
Georgia        2002         3           1000
Georgia        2003         3           400
Georgia        2001         4           500
Georgia        2002         4           2000
Georgia        2003         4           300
Georgia        2001         5           800
Georgia        2002         5           2500
Georgia        2003         5           8000
Washington     2001         0           1000
Washington     2006         0           5000
Washington     2001         1           10000
Washington     2006         1           300
Washington     2001         2           6000
Washington     2006         2           200
Washington     2001         3           8000
Washington     2006         3           900
Washington     2001         4           9900
Washington     2006         4           1000
Washington     2001         5           10000
Washington     2006         5           8200

这是我的输入

structure(list(state = structure(c(1L, 1L, 1L, 2L, 2L), .Label = c("georgia", 
"washington"), class = "factor"), year = c(2001L, 2002L, 2003L, 
2001L, 2006L), X0 = structure(c(1L, 3L, 4L, 2L, 5L), .Label = c("10,000", 
"1000", "20,000", "2000", "5000"), class = "factor"), X1 = structure(c(2L, 
4L, 5L, 1L, 3L), .Label = c("10,000", "200", "300", "500", "5000"
), class = "factor"), X2 = c(300L, 500L, 1000L, 6000L, 200L), 
 X3 = c(400L, 1000L, 400L, 8000L, 900L), X4 = c(500L, 2000L, 
300L, 99000L, 1000L), X5 = structure(c(3L, 2L, 4L, 1L, 4L
), .Label = c("10,000", "2500", "800", "8000"), class = "factor")), class = "data.frame", row.names 
 = 
   c(NA, 
-5L))

这是我尝试过的:

我知道我必须按州和年份进行分组,并可能通过使用gather() 函数来执行某种类型的数据透视

library(tidyr)
library(dplyr)

df1 <- gather(df, 0, 1, 2, 3, 4, 5 factor_key=TRUE)

df %>% groupby(State, Year) %>%
mutate('Age', 'Population')

【问题讨论】:

  • tidyr::gather(df, key, value, -c(state, year))。另请注意,gather 已停用并由pivot_longer 取代。

标签: r dplyr tidyverse tidyr


【解决方案1】:

我们可以先将列类型通过提取数字部分转换为数字,然后再做reshape

library(dplyr)
library(tidyr)
df %>% 
  mutate_at(vars(matches('\\d+$')), ~readr::parse_number(as.character(.))) %>% 
  pivot_longer(cols = -c(state, year), names_to = "Age", values_to = "Population")

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-07-21
    • 2016-12-07
    • 2017-09-02
    • 2018-11-05
    相关资源
    最近更新 更多