带有names_pattern的pivot_longer [重复]答案

【问题标题】：pivot_longer with names_pattern [duplicate]带有names_pattern的pivot_longer [重复]
【发布时间】：2021-02-22 09:32:04
【问题描述】：

我对整个编程的东西都很陌生，但我需要为大型数据集编写可重现的脚本。我希望我提供了一个足够的例子。

我有一个这样的数据框（还有 8 个“营养素”和 5 个“贸易元素”以及更多年）：

Year<-c(1961,1962)
Total_Energy_kcal_Production<-c(5,8)
Total_Energy_kcal_Import<-c(6,1)
Total_Ca_g_Production<-c(3,4)
Total_Ca_g_Import<-c(3,8)
df<-cbind(Year,Total_Energy_kcal_Production, Total_Energy_kcal_Import, Total_Ca_g_Production, Total_Ca_g_Import)

看起来像：

Year  Total_Energy_kcal_Production   Total_Energy_kcal_Import   Total_Ca_g_Production    Total_Ca_g_Import 
1961   5                              6                          3                       3
1962   8                              1                          4                       8

我希望它看起来像这样：

Year  Nutrient            Production        Import
1961  Total_Energy_kcal   5                 6
1962  Total_Energy_kcal   8                 1
1961  Total_Ca_g          3                 3 
1962  Total_Ca_g          4                 8

我尝试了很多 pivot_longer 和 names_patern。我认为这可行，但我不完全理解这些论点：

df_piv<-df%>%
  pivot_longer(cols = -Year, names_to = "Nutrient", 
              names_pattern = ".*(?=_)")

我收到一条我无法解释的错误消息：

Error: Can't select within an unnamed vector.

【问题讨论】：

标签： r dplyr pivot

【解决方案1】：

您可以提供names_pattern 正则表达式为：

tidyr::pivot_longer(df, 
                    cols = -Year, 
                    names_to = c('Nutrient', '.value'),
                    names_pattern = '(.*)_(\\w+)')

#   Year Nutrient          Production Import
#  <dbl> <chr>                  <dbl>  <dbl>
#1  1961 Total_Energy_kcal          5      6
#2  1961 Total_Ca_g                 3      3
#3  1962 Total_Energy_kcal          8      1
#4  1962 Total_Ca_g                 4      8

这会将所有内容放在Nutrient 列中的最后一个下划线之前，其余数据将保留为列名。

数据

cbind 将创建一个矩阵，使用data.frame 创建数据。

df<-data.frame(Year,Total_Energy_kcal_Production,Total_Energy_kcal_Import, 
               Total_Ca_g_Production, Total_Ca_g_Import)

【讨论】：

非常感谢！当我之前尝试过这个时，cbind 导致了这个问题。非常感谢！
@Ronak shah，你能解释一下'\\w+'吗？我还尝试了 df3 %>% pivot_longer(cols = Total_Energy_kcal_Production:Total_Ca_g_Import , names_to = c('Nutrient','.value'), names_pattern='(..*)_(.*)' ) 并且它有效！但是不知道为什么？
这取决于我们拥有的列名。在这里，我们只需要一个单词作为列名（Production 和Import），所以\\w+ 有效。由于正则表达式是贪婪的，因此您使用 (.*)_(.*) 的选项也可以。