【问题标题】:Reshaping a long format data frame to wide based on two columns [duplicate]基于两列将长格式数据框重塑为宽[重复]
【发布时间】:2020-10-10 08:01:34
【问题描述】:

我有一个如下所示的数据框:

dat <- data.frame(QuarterYear = c("Q4 2019", "Q4 2019", "Q4 2019", 
                              "Q4 2019", "Q4 2019", "Q4 2019", "Q4 2019", "Q4 2019", "Q4 2019", 
                              "Q4 2019", "Q4 2019", "Q4 2019", "Q1 2020", "Q1 2020", "Q1 2020", 
                              "Q1 2020", "Q1 2020", "Q1 2020", "Q1 2020", "Q1 2020", "Q1 2020", 
                              "Q1 2020", "Q1 2020", "Q1 2020", "Q2 2020", "Q2 2020", "Q2 2020", 
                              "Q2 2020", "Q2 2020", "Q2 2020", "Q2 2020", "Q2 2020", "Q2 2020", 
                              "Q2 2020", "Q2 2020", "Q2 2020", "Q3 2020", "Q3 2020", "Q3 2020", 
                              "Q3 2020", "Q3 2020", "Q3 2020", "Q3 2020", "Q3 2020", "Q3 2020", 
                              "Q3 2020", "Q3 2020", "Q3 2020"), 
              Grade = c("Grade 8", "Grade 8", 
                        "Grade 8", "Grade 9", "Grade 9", "Grade 9", "Grade 10", "Grade 10", 
                        "Grade 10", "Grade 11", "Grade 11", "Grade 11", "Grade 8", "Grade 8", 
                        "Grade 8", "Grade 9", "Grade 9", "Grade 9", "Grade 10", "Grade 10", 
                        "Grade 10", "Grade 11", "Grade 11", "Grade 11", "Grade 8", "Grade 8", 
                        "Grade 8", "Grade 9", "Grade 9", "Grade 9", "Grade 10", "Grade 10", 
                        "Grade 10", "Grade 11", "Grade 11", "Grade 11", "Grade 8", "Grade 8", 
                        "Grade 8", "Grade 9", "Grade 9", "Grade 9", "Grade 10", "Grade 10", 
                        "Grade 10", "Grade 11", "Grade 11", "Grade 11"), 
              Type = c("overallAverage", 
                       "CT", "RT", "overallAverage", "CT", "RT", "overallAverage", "CT", 
                       "RT", "overallAverage", "CT", "RT", "overallAverage", "CT", "RT", 
                       "overallAverage", "CT", "RT", "overallAverage", "CT", "RT", "overallAverage", 
                       "CT", "RT", "overallAverage", "CT", "RT", "overallAverage", "CT", 
                       "RT", "overallAverage", "CT", "RT", "overallAverage", "CT", "RT", 
                       "overallAverage", "CT", "RT", "overallAverage", "CT", "RT", "overallAverage", 
                       "CT", "RT", "overallAverage", "CT", "RT"), 
              value = c(2.48, 2.21, 
                        0.27, 3.48, 3.03, 0.45, 4.6, 4, 0.6, 2.8, 2.4, 0.4, 2.54, 2.28, 
                        0.26, 3.45, 3, 0.45, 4.46, 3.88, 0.58, 3.56, 2.81, 0.75, 2.47, 
                        2.14, 0.33, 2.96, 2.54, 0.41, 4.1, 3.69, 0.41, 3.44, 2.61, 0.83, 
                        2, 1.81, 0.19, 2.54, 2.26, 0.28, 4.11, 3.68, 0.43, 2.67, 2.11, 
                        0.56), stringsAsFactors = FALSE)

我正在尝试将此数据框重塑为宽格式,其中 Type 的唯一值将是行,并且将根据 QuarterYearGrade 填充值。

简单来说,如果第一行是OverallAverage,那么前4列将代表Q4 2019-Grade 8Q3 2020- Grade 8。接下来的 4 列将用于 Q4 2019-Grade 9Q3 2020-Grade 9 等等。

我尝试使用reshape 函数

widerDat <- reshape(dat, direction = "wide",idvar = "Type",timevar = "value")  

如何组合QuarterYearGrade 以获得所需的输出?

请帮助我找到合适的解决方案。提前致谢!!

【问题讨论】:

  • 这行得通吗? dat %>% pivot_wider(names_from = c(QuarterYear, Grade), values_from = value)
  • 这能回答你的问题吗? How to reshape data from long to wide format
  • @prosoitos 显然不是,OP 经历了拆分时间变量的特殊情况。

标签: r dataframe reshape


【解决方案1】:

我觉得这样就可以了

library(tidyverse)

wider_data <- dat %>% mutate(new_col = paste(Grade,QuarterYear, sep = " ")) %>%
  select(Type, new_col, value) %>%
  pivot_wider(names_from = new_col, values_from = value)

要手动重新排列列,请使用此

wider_data <- wider_data %>% select(1,2,6,10,14,3,7,11,15,4,8,12,16,5,9,13,17)

【讨论】:

    【解决方案2】:

    您可以将 paste 时间变量一起使用,并将其用作单个 time= 变量,如下所示:

    res <- reshape(transform(dat, time=paste(QuarterYear, Grade)), 
                   direction="wide", idvar="Type", timevar="time",
                   drop=c("QuarterYear", "Grade"))  
    res
    #             Type value.Q4 2019 Grade 8 value.Q4 2019 Grade 9
    # 1 overallAverage                  2.48                  3.48
    # 2             CT                  2.21                  3.03
    # 3             RT                  0.27                  0.45
    #   value.Q4 2019 Grade 10 value.Q4 2019 Grade 11 value.Q1 2020 Grade 8
    # 1                    4.6                    2.8                  2.54
    # 2                    4.0                    2.4                  2.28
    # 3                    0.6                    0.4                  0.26
    #   value.Q1 2020 Grade 9 value.Q1 2020 Grade 10 value.Q1 2020 Grade 11
    # 1                  3.45                   4.46                   3.56
    # 2                  3.00                   3.88                   2.81
    # 3                  0.45                   0.58                   0.75
    #   value.Q2 2020 Grade 8 value.Q2 2020 Grade 9 value.Q2 2020 Grade 10
    # 1                  2.47                  2.96                   4.10
    # 2                  2.14                  2.54                   3.69
    # 3                  0.33                  0.41                   0.41
    #   value.Q2 2020 Grade 11 value.Q3 2020 Grade 8 value.Q3 2020 Grade 9
    # 1                   3.44                  2.00                  2.54
    # 2                   2.61                  1.81                  2.26
    # 3                   0.83                  0.19                  0.28
    #   value.Q3 2020 Grade 10 value.Q3 2020 Grade 11
    # 1                   4.11                   2.67
    # 2                   3.68                   2.11
    # 3                   0.43                   0.56
    

    要以所需格式对列进行排序,我们可以使用substr

    nm <- names(res)[-1]  ## store names in a vector
    ## generate order vector by relevant characters
    o <- order(as.double(substr(nm, 21, 22)), as.double(substr(nm, 10, 13)),
               as.double(substr(nm, 8, 8))) + 1
    res <- res[c(1, o)]  ## ordering
    names(res)
    #  [1] "Type"                   "value.Q4 2019 Grade 8"  "value.Q1 2020 Grade 8" 
    #  [4] "value.Q2 2020 Grade 8"  "value.Q3 2020 Grade 8"  "value.Q4 2019 Grade 9" 
    #  [7] "value.Q1 2020 Grade 9"  "value.Q2 2020 Grade 9"  "value.Q3 2020 Grade 9" 
    # [10] "value.Q4 2019 Grade 10" "value.Q1 2020 Grade 10" "value.Q2 2020 Grade 10"
    # [13] "value.Q3 2020 Grade 10" "value.Q4 2019 Grade 11" "value.Q1 2020 Grade 11"
    # [16] "value.Q2 2020 Grade 11" "value.Q3 2020 Grade 11"
    

    【讨论】:

    • 如何根据每个年级的季度获得连续的列?例如,列顺序将是 Q4 2019 Grade 8、Q1 2020 Grade 8、Q2 2020 Grade 8、Q3 2020 Grade、Q4 2019 Grade 9。
    • @NevedhaAyyanar Usng substr,见编辑。
    猜你喜欢
    • 2021-09-15
    • 2012-03-25
    • 1970-01-01
    • 1970-01-01
    • 2020-10-23
    • 2021-05-01
    • 1970-01-01
    相关资源
    最近更新 更多