如何替换字符串中的匹配项并索引每个匹配项答案

【问题标题】：How to replace matches in a string and index each match如何替换字符串中的匹配项并索引每个匹配项
【发布时间】：2017-11-02 15:33:06
【问题描述】：

一个特定的字符串可以包含我试图匹配的模式的多个实例。例如，如果我的模式是<N(.+?)N>，我的字符串是"My name is <N Timon N> and his name is <N Pumba N>"，那么就有两个匹配项。我想用包含要替换匹配项的索引的替换项替换每个匹配项。

所以在我的字符串"My name is <N Timon N> and his name is <N Pumba N>"，我想把字符串改成"My name is [Name #1] and his name is [Name #2]"。

我如何实现这一点，最好使用单个函数？最好使用stringr或stringi的函数？

【问题讨论】：

标签： r string substring stringr stringi

【解决方案1】：

您可以在 Base R 中使用 gregexpr 和 regmatches：

my_string = "My name is <N Timon N> and his name is <N Pumba N>"

# Get the positions of the matches in the string
m = gregexpr("<N(.+?)N>", my_string, perl = TRUE)

# Index each match and replace text using the indices
match_indices = 1:length(unlist(m))

regmatches(my_string, m) = list(paste0("[Name #", match_indices, "]"))

结果：

> my_string
# [1] "My name is [Name #1] and his name is [Name #2]"

注意：

如果多次出现，此解决方案会将相同的匹配视为不同的“名称”。例如：

my_string = "My name is <N Timon N> and his name is <N Pumba N>, <N Timon N> again"


m = gregexpr("<N(.+?)N>", my_string, perl = TRUE)

match_indices = 1:length(unlist(m))

regmatches(my_string, m) = list(paste0("[Name #", match_indices, "]"))

输出：

> my_string
[1] "My name is [Name #1] and his name is [Name #2], [Name #3] again"

【讨论】：

@BIQS 感谢您的编辑，但我更喜欢在这类问题中保持简单。
我认为我们不同意最简单的方法，但您当然有权选择自己的答案。我喜欢你的方法，并且可能会接受它作为答案，具体取决于其他内容。
@BIQS 更简单，因为创建的中间变量更少，会使我的工作空间变得混乱。我同意您的编辑可能更具可读性，但不一定更简单
很公平。我认为最终的答案简单易读，所以我很满意。
我喜欢。您可以将此作为单独的答案提交吗？这是一种完全不同的方法，它会在常见用例中给出与您的原始答案不同的结果。例如，对于字符串“My name is and his name is ”，这两种方法会产生不同的结果。基本的 R 方法将产生“我的名字是 [Name #1]，他的名字也是 [Name #2]”。而 tidyverse 方法会产生“我的名字是 [Name #1]，他的名字也是 [Name #1]。”

【解决方案2】：

这是一个依赖于 gsubfn 和 proto 包的解决方案。

# Define the string to which the function will be applied
my_string <- "My name is <N Timon N> and his name is <N Pumba N>"

# Define the replacement function
replacement_fn <- function(x) {

  replacment_proto_fn <- proto::proto(fun = function(this, x) {
      paste0("[Name #", count, "]")
  })

  gsubfn::gsubfn(pattern = "<N(.+?)N>",
                 replacement = replacment_proto_fn,
                 x = x)
}

# Use the function on the string
replacement_fn(my_string)

【讨论】：

您可能对胶水包感兴趣：github.com/tidyverse/glue，如果我正确理解了小插图（我自己从未使用过），它的语法类似于 "Hiya, I'm {Timon}, yo"。
谢谢-我认为这是一个很好的建议。我很想能够为此使用胶水，但我还没有弄清楚。
我觉得count部分会有点难glue(gsub("<N\\s+(\\w+)\\s+N>", "{\\1}", my_string), Timon = "[Name #1]", Pumba = "[Name #2]")# My name is [Name #1] and his name is [Name #2]

【解决方案3】：

这是dplyr + stringr 的不同方法：

library(dplyr)
library(stringr)

string %>%
  str_extract_all("<N(.+?)N>") %>%
  unlist() %>%
  setNames(paste0("[Name #", 1:length(.), "]"), .) %>%
  str_replace_all(string, .)

# [1] "My name is [Name #1] and his name is [Name #2]"

注意：

第二种解决方案提取str_extract_all的匹配项，然后使用匹配项创建一个命名的替换向量，最后将其输入str_replace_all进行相应的搜索和替换。

正如 OP 所指出的，在某些情况下，此解决方案产生的结果与 gregexpr + regmatches 方法不同。例如：

string = "My name is <N Timon N> and his name is <N Pumba N>, <N Timon N> again"

string %>%
  str_extract_all("<N(.+?)N>") %>%
  unlist() %>%
  setNames(paste0("[Name #", 1:length(.), "]"), .) %>%
  str_replace_all(string, .)

输出：

[1] "My name is [Name #1] and his name is [Name #2], [Name #1] again"

【讨论】：

【解决方案4】：

简单，可能很慢，但应该可以：

ct <- 1
while(TRUE) {
 old_string <- my_string; 
 my_string <- stri_replace_first_regex(my_string, '\\<N.*?N\\>', 
       paste0('[name', ct, ,']')); 
  if (old_string == my_string) break 
  ct <- ct + 1
}

【讨论】：

@user 的解决方案更好！
这种方法适用于问题中的特定示例，但它是否有效取决于正则表达式和替换。例如，如果正则表达式类似于空格或单词边界（例如“\\w+”）并且替换不会删除匹配项，那么用户将陷入无休止的循环。