如何使用 R 提取字符串的一部分并将它们放在不同的列中答案

【问题标题】：How to extract parts of string using R and place them in different columns如何使用 R 提取字符串的一部分并将它们放在不同的列中
【发布时间】：2020-02-06 15:28:19
【问题描述】：

我确实有一列文本，我想从中提取一些信息并将它们放入一个新列中。

例子：

Text-1 <- "issue : there has been considerable changes and it is going on. finding : we need further investigation on this. resolution: please check the validity"

Text-2 <- "issue : there has been considerable changes and it is going on. resolution: please check the validity"

Text-3 <- "finding : we need further investigation on this. resolution: please check the validity"

Text-4 <- "please check the validity"

我正在寻找的解决方案是当我们应用 R 正则表达式时，预期的结果应该如下所示。文本根据其存在分为 3 个不同的列。

文本-1

issue <- there has been considerable changes and it is going on 

finding <- we need further investigation on this

resolution <- please check the validity

文本-2

issue <- there has been considerable changes and it is going on

finding <- NA

resolution <- please check the validity

文本-3

issue <- NA

finding <- we need further investigation on this

resolution <- please check the validity

文本-4

issue <- NA

finding <- NA

resolution <- please check the validity

请帮忙

【问题讨论】：

标签： r regex split

【解决方案1】：

你可以使用包脱胶：

library(unglue)
text <- c(
  "issue : there has been considerable changes and it is going on. finding : we need further investigation on this. resolution: please check the validity",
  "issue : there has been considerable changes and it is going on. resolution: please check the validity",
  "finding : we need further investigation on this. resolution: please check the validity",
  "please check the validity")

patterns <- c("issue : {issue}. finding : {finding}. resolution: {resolution}",
              "issue : {issue}. resolution: {resolution}",
              "finding : {finding}. resolution: {resolution}")

unglue_data(text, patterns)
#>                                                    issue
#> 1 there has been considerable changes and it is going on
#> 2 there has been considerable changes and it is going on
#> 3                                                   <NA>
#> 4                                                   <NA>
#>                                 finding                resolution
#> 1 we need further investigation on this please check the validity
#> 2                                  <NA> please check the validity
#> 3 we need further investigation on this please check the validity
#> 4                                  <NA>                      <NA>

^{由reprex package (v0.3.0) 于 2019 年 10 月 9 日创建}

如果你的真实案例比这个例子有更多的组合，这不会有太大帮助，但在这种情况下它工作正常，我们在一个向量中给出 3 种可能的模式并使用第一个工作模式，结果在一个数据框中给出，每个提取的变量有一列。

要从数据框开始并添加列，请使用unglue_unnest(your_df, your_col, patterns)，如果要保留原始列，请设置remove = FALSE。

【讨论】：