替换时如何保留正则表达式中用于模式匹配的部分字符串？答案

【问题标题】：How do I keep part of the character string used in regular expressions for pattern matching when replacing?替换时如何保留正则表达式中用于模式匹配的部分字符串？
【发布时间】：2019-10-02 16:17:57
【问题描述】：

我正在使用stringr 来帮助操作一些存储在字符向量中的 html 代码，代码如下所示：

foo <- 'text-align:left;"> 4: Forging Foundations </td>\n'

在我的完整 html 代码中，我有多个不同的字符串出现在 4: Forging Foundations 的位置，我需要使用整个这部分代码作为替换模式。我正在寻找的最终文本输出是：

'text-align:left;background-color: #B0fff4 !important;"> 4: Forging Foundations </td>\n'

所以我想用. 正则表达式和* 量词代替4: Forging Foundations：

foo <- str_replace_all(
  foo,
  'text-align:left;">.*(?=</td>\n)',
  'text-align:left;background-color: #B0fff4 !important">.*(?=</td>\n)'
)

但是，这最终会用我使用的正则表达式语法替换部分原始字符串 - 我正在寻找某种方法来保持字符向量的那部分不受影响。

【问题讨论】：

您知道不能在替换字符串中使用正则表达式模式吗？
是的，在尝试这个之后，我还是 regex 的新手..
如果您似乎要替换硬编码的固定字符串，为什么还要使用正则表达式？试试sub('text-align:left;">', 'text-align:left;background-color: #B0fff4 !important;">', foo, fixed=TRUE)
我的 html 代码的其他部分具有 'text-align:left;"> ，但我不想替换那些，只替换那些与 foo 具有相似结构的部分。
那么，gsub('text-align:left;">([^<]*</td>)', 'text-align:left;background-color: #B0fff4 !important;">\\1', foo)?如果你需要替换所有出现的地方，gsub 似乎是一个足够好的基础 R 函数

标签： r regex stringr

【解决方案1】：

你可以使用

gsub('text-align:left;">([^<]*</td>)', 'text-align:left;background-color: #B0fff4 !important;">\\1', foo)
# => [1] "text-align:left;background-color: #B0fff4 !important;\"> 4: Forging Foundations </td>\n"

([^<]*</td>) 部分是一个捕获组，它匹配除 < 和 </td> 之外的任何 0+ 字符，然后在替换模式中，使用 $1 替换反向引用恢复此部分。

【讨论】：