【问题标题】:How to extract parts of string using R and place them in different columns如何使用 R 提取字符串的一部分并将它们放在不同的列中
【发布时间】:2020-02-06 15:28:19
【问题描述】:

我确实有一列文本,我想从中提取一些信息并将它们放入一个新列中。

例子:

Text-1 <- "issue : there has been considerable changes and it is going on. finding : we need further investigation on this. resolution: please check the validity"

Text-2 <- "issue : there has been considerable changes and it is going on. resolution: please check the validity"

Text-3 <- "finding : we need further investigation on this. resolution: please check the validity"

Text-4 <- "please check the validity"

我正在寻找的解决方案是当我们应用 R 正则表达式时,预期的结果应该如下所示。文本根据其存在分为 3 个不同的列。

文本-1

issue <- there has been considerable changes and it is going on 

finding <- we need further investigation on this

resolution <- please check the validity

文本-2

issue <- there has been considerable changes and it is going on

finding <- NA

resolution <- please check the validity

文本-3

issue <- NA

finding <- we need further investigation on this

resolution <- please check the validity

文本-4

issue <- NA

finding <- NA

resolution <- please check the validity

请帮忙

【问题讨论】:

    标签: r regex split


    【解决方案1】:

    你可以使用包脱胶

    library(unglue)
    text <- c(
      "issue : there has been considerable changes and it is going on. finding : we need further investigation on this. resolution: please check the validity",
      "issue : there has been considerable changes and it is going on. resolution: please check the validity",
      "finding : we need further investigation on this. resolution: please check the validity",
      "please check the validity")
    
    patterns <- c("issue : {issue}. finding : {finding}. resolution: {resolution}",
                  "issue : {issue}. resolution: {resolution}",
                  "finding : {finding}. resolution: {resolution}")
    
    unglue_data(text, patterns)
    #>                                                    issue
    #> 1 there has been considerable changes and it is going on
    #> 2 there has been considerable changes and it is going on
    #> 3                                                   <NA>
    #> 4                                                   <NA>
    #>                                 finding                resolution
    #> 1 we need further investigation on this please check the validity
    #> 2                                  <NA> please check the validity
    #> 3 we need further investigation on this please check the validity
    #> 4                                  <NA>                      <NA>
    

    reprex package (v0.3.0) 于 2019 年 10 月 9 日创建

    如果你的真实案例比这个例子有更多的组合,这不会有太大帮助,但在这种情况下它工作正常,我们在一个向量中给出 3 种可能的模式并使用第一个工作模式,结果在一个数据框中给出,每个提取的变量有一列。

    要从数据框开始并添加列,请使用unglue_unnest(your_df, your_col, patterns),如果要保留原始列,请设置remove = FALSE

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2020-11-08
      • 2018-12-06
      • 2015-09-19
      • 2012-04-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多