R pivot_longer 根据列名的结尾组合列答案

【问题标题】：R pivot_longer combining columns based on the end of column namesR pivot_longer 根据列名的结尾组合列
【发布时间】：2020-12-05 14:34:10
【问题描述】：

我有一个包含多个列的数据框，其名称具有不同的长度和结构（因此不确定如何使用正则表达式捕获它们）。每列以.t1 或.t3 结尾

我想将基于不带 t1/t3 的名称的列与基于该后缀的 Time 附加列组合起来。

因此，例如，一个数据框，例如：

df<-data.frame("Subject"= c(1:10),
"intercept.freq.acc.t1" = c(1:10),
"intercept.freq.acc.t3" = c(1:10),
"freq.rt.t1" = c(1:10), 
"freq.rt.t3" = c(1:10),
"vowel.con.acc.t1" = c(1:10),
"vowel.con.acc.t3" = c(1:10))

我想把它变成

df<-data.frame("Subject"= rep(1:10,2),
"Time" = rep(c('t1','t3'), each = 10),
"intercept.freq.acc" = rep(1:10, 2),
"freq.rt" = rep(1:10,2), 
"vowel.con.acc" = rep(1:10, 2))

我该怎么做？

【问题讨论】：

标签： r dataframe tidyr

【解决方案1】：

你可以使用：

tidyr::pivot_longer(df, 
             cols = -Subject, 
             names_to = c('.value', 'Time'), 
             names_pattern = '(.*)\\.(t\\d+)')

#   Subject Time  intercept.freq.acc freq.rt vowel.con.acc
#     <int> <chr>              <int>   <int>         <int>
# 1       1 t1                     1       1             1
# 2       1 t3                     1       1             1
# 3       2 t1                     2       2             2
# 4       2 t3                     2       2             2
# 5       3 t1                     3       3             3
# 6       3 t3                     3       3             3
# 7       4 t1                     4       4             4
# 8       4 t3                     4       4             4
# 9       5 t1                     5       5             5
#10       5 t3                     5       5             5
#11       6 t1                     6       6             6
#12       6 t3                     6       6             6
#13       7 t1                     7       7             7
#14       7 t3                     7       7             7
#15       8 t1                     8       8             8
#16       8 t3                     8       8             8
#17       9 t1                     9       9             9
#18       9 t3                     9       9             9
#19      10 t1                    10      10            10
#20      10 t3                    10      10            10

【讨论】：

【解决方案2】：

您可以使用pivot_longer_spec 函数。此函数采用数据框模板，您可以在其中指定输入和输出列，然后将此模板输入pivot_longer_spec 函数。

当您的列没有漂亮且简单的拆分模式时，这通常非常有用。就个人而言，我发现使用这样的模板比计算用于拆分列的正则表达式更容易（在这种情况下，正则表达式仍然可以）：

library(tidyverse)
template <- data.frame(.name  = colnames(df)[-1],
                       .value = c("intercept.freq.acc", "intercept.freq.acc", "freq.rt", "freq.rt", "vowel.con.acc", "vowel.con.acc"),
                       Time   = c("t1", "t3", "t1", "t3", "t1", "t3"))

模板如下：

                  .name             .value Time
1 intercept.freq.acc.t1 intercept.freq.acc   t1
2 intercept.freq.acc.t3 intercept.freq.acc   t3
3            freq.rt.t1            freq.rt   t1
4            freq.rt.t3            freq.rt   t3
5      vowel.con.acc.t1      vowel.con.acc   t1
6      vowel.con.acc.t3      vowel.con.acc   t3

然后你可以做一个简单的pivot_longer：

dat_long <- df %>%
  pivot_longer_spec(template)

给出：

# A tibble: 20 x 5
   Subject Time  intercept.freq.acc freq.rt vowel.con.acc
     <int> <chr>              <int>   <int>         <int>
 1       1 t1                     1       1             1
 2       1 t3                     1       1             1
 3       2 t1                     2       2             2
 4       2 t3                     2       2             2
 5       3 t1                     3       3             3
 6       3 t3                     3       3             3
 7       4 t1                     4       4             4
 8       4 t3                     4       4             4
 9       5 t1                     5       5             5
10       5 t3                     5       5             5
11       6 t1                     6       6             6
12       6 t3                     6       6             6
13       7 t1                     7       7             7
14       7 t3                     7       7             7
15       8 t1                     8       8             8
16       8 t3                     8       8             8
17       9 t1                     9       9             9
18       9 t3                     9       9             9
19      10 t1                    10      10            10
20      10 t3                    10      10            10

【讨论】：

谢谢，这很好，虽然对我来说用处不大，因为在实际的数据框中有几十个不同的列！
如果 ir 可以分解为正则表达式模式，那很好。但有时你真的没有这样的模式。通常我只是在 Excel 中创建模板，你可以很容易地自动完成名称，然后我将模板导入 R。或者在上面的例子中，可以通过几个代表函数来简化模板的创建。

【解决方案3】：

我们可以使用melt

library(data.table)
 melt(setDT(df), id.var = 'Subject', measure = patterns('intercept', 'freq', 'vowel'), value.name = c('intercept.freq.acc', 'freq.rt', 'vowel.con.acc'))

【讨论】：