【问题标题】:Pivot longer with blocks of variables使用变量块旋转更长的时间
【发布时间】:2021-06-04 23:57:42
【问题描述】:

我在变量块上使用pivot_longer 时遇到问题。假设我有这个:


我想要这个:

dfwide <- structure(list(date = structure(c(1577836800, 1577923200, 1578009600, 
1578096000, 1578182400, 1578268800), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), x1_a = c(20, 15, 12, NA, 25, 27), x1_b = c(33, 
44, 85, 10, 12, 3), x1_c = c(70, 20, 87, 11, 20, 5), x2_a = c(85, 
65, 33, 46, 82, 9), x2_b = c(87, 25, 55, 64, 98, 5), x2_c = c(77, 
51, 92, 20, 37, 98)), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame")) 

##Tried:
dfwide %>% 
  pivot_longer(cols = -date,
             names_sep = c("x1", "x2"),
             names_to = c("a", "b", "c"),
             values_to = "value")

【问题讨论】:

  • dfwide 中的数据不完整。请问,你能修复你的输出吗?
  • 糟糕。修好了。

标签: r dplyr tidyverse reshape


【解决方案1】:

你可以试试这个代码:

library(tidyverse)
dfwide %>% 
  pivot_longer(cols = -date,
               names_to = "which",
               values_to = "value") %>%
  separate(which, into = c("which","letter"), sep = "_") %>%
  pivot_wider(names_from = "letter", values_from = "value") %>%
  arrange(which)

这是结果:

# A tibble: 12 x 5
   date                which     a     b     c
   <dttm>              <chr> <dbl> <dbl> <dbl>
 1 2020-01-01 00:00:00 x1       20    33    70
 2 2020-01-02 00:00:00 x1       15    44    20
 3 2020-01-03 00:00:00 x1       12    85    87
 4 2020-01-04 00:00:00 x1       NA    10    11
 5 2020-01-05 00:00:00 x1       25    12    20
 6 2020-01-06 00:00:00 x1       27     3     5
 7 2020-01-01 00:00:00 x2       85    87    77
 8 2020-01-02 00:00:00 x2       65    25    51
 9 2020-01-03 00:00:00 x2       33    55    92
10 2020-01-04 00:00:00 x2       46    64    20
11 2020-01-05 00:00:00 x2       82    98    37
12 2020-01-06 00:00:00 x2        9     5    98

【讨论】:

    【解决方案2】:

    这一行利用了pivot_longer 函数的名称分隔选项。

    pivot_longer(dfwide, -date, names_sep = "_", 
                 names_to=c("which", ".value")) %>% 
       arrange(which)
    
    
        # A tibble: 12 x 5
       date                which     a     b     c
       <dttm>              <chr> <dbl> <dbl> <dbl>
     1 2020-01-01 00:00:00 x1       20    33    70
     2 2020-01-02 00:00:00 x1       15    44    20
     3 2020-01-03 00:00:00 x1       12    85    87
     4 2020-01-04 00:00:00 x1       NA    10    11
     5 2020-01-05 00:00:00 x1       25    12    20
     6 2020-01-06 00:00:00 x1       27     3     5
     7 2020-01-01 00:00:00 x2       85    87    77
     8 2020-01-02 00:00:00 x2       65    25    51
     9 2020-01-03 00:00:00 x2       33    55    92
    10 2020-01-04 00:00:00 x2       46    64    20
    11 2020-01-05 00:00:00 x2       82    98    37
    12 2020-01-06 00:00:00 x2        9     5    98
    

    【讨论】:

    • 只是一个问题@Dave2e,.value 是如何工作的?顺便说一句,答案很好。
    • "特殊名称 .value:这告诉 pivot_longer() 列名的那部分指定了被测量的“值”。我的建议是去 tidyr 的关于旋转的小插图并阅读“每行多个观察结果”部分(这是从 Pivot Longer 标题向下的几个部分。)
    • 感谢您的回答@Dave2e!
    【解决方案3】:

    如果您可以在多个步骤中执行此操作,则此方法有效。首先收集列,用下划线分隔,然后展开值。

    pivot_longer(dfwide, x1_a:x2_c, names_to="which") %>% 
      extract(which, into=c("var", "letter"), regex="(.*)_(.*)") %>%
      pivot_wider(names_from=letter, values_from=value)
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2015-01-30
      • 2021-10-29
      • 2021-12-27
      • 1970-01-01
      • 2023-01-27
      • 2019-09-11
      • 1970-01-01
      相关资源
      最近更新 更多