用反斜杠替换所有内容，直到下一个空格答案

【问题标题】：Replacing everything with a backslash till next white space用反斜杠替换所有内容，直到下一个空格
【发布时间】：2021-12-27 14:18:10
【问题描述】：

作为预处理我的数据的一部分，我希望能够替换任何带有斜杠的内容，直到出现空字符串的空格。例如，\fs24 需要替换为空或 \qc23424 需要替换为空。可能会出现多次带有斜杠的标签，我想删除这些标签。我创建了一个“要根除的标签”列表，我的目标是在正则表达式中使用它来清理提取的文本。

输入字符串：This is a string \fs24 and it contains some texts and tags \qc23424. which I want to remove from my string.

预期输出：This is a string and it contains some texts and tags. which I want to remove from my string.

我在 Python 中使用基于正则表达式的替换函数：

udpated = re.sub(r'/\fs\d+', '')

但是，这并没有获取所需的结果。或者，我已经建立了一个根除列表，并将其从一个循环中替换为从上到下的数字，但这是一个性能杀手。

【问题讨论】：

喜欢这个\\[a-z]+\d+你的意思是？
re.sub 接受三个参数；您没有传递要执行替换的字符串。另外，您认为/ 在正则表达式中的作用是什么？

标签： python regex

【解决方案1】：

假设“标签”也可以出现在字符串的开头，并避免选择误报，也许你可以使用：

\s?(?<!\S)\\[a-z\d]+

然后什么都没有。在线查看demo。

\s? - 可选地匹配一个空白字符（如果一个标签是中间字符串，因此前面有一个空格）；
(?<!\S) - 断言位置前面没有非空白字符（以允许在输入的开头有一个位置）；
\\ - 文字反斜杠。
[a-z\d]+ - 每个给定类有 1+ 个（贪婪）字符。

【讨论】：

【解决方案2】：

首先，/ 根本不属于正则表达式。

其次，即使您使用的是原始字符串文字，\ 本身对正则表达式引擎具有特殊含义，因此您仍然需要对其进行转义。（如果没有原始字符串文字，您将需要'\\\\fs\\d+'。）f 之前的\ 意味着按字面意思使用； d 之前的 \ 是与数字匹配的字符类的一部分。

最后，sub 接受三个参数：模式、替换文本和执行替换的字符串。

>>> re.sub(r'\\fs\d+', '', r"This is a string \fs24 and it contains...")
'This is a string  and it contains...'

【讨论】：

【解决方案3】：

这对你有用吗？

re.sub(
    r"\\\w+\s*",  # a backslash followed by alphanumerics and optional spacing;
    '',           # replace it with an empty string;
    input_string  # in your input string
)

>>> re.sub(r"\\\w+\s*", "", r"\fs24 hello there")
'hello there'
>>> re.sub(r"\\\w+\s*", "", "hello there")
'hello there'
>>> re.sub(r"\\\w+\s*", "", r"\fs24hello there")
'there'
>>> re.sub(r"\\\w+\s*", "", r"\fs24hello \qc23424 there")
'there'

【讨论】：

【解决方案4】：

'\\' 匹配 '\' 和 'w+' 匹配一个单词直到空格

import re
s = r"""This is a string \fs24 and it contains some texts and tags \qc23424. which I want to remove from my string."""
re.sub(r'\\\w+', '', s)

输出：

'This is a string  and it contains some texts and tags . which I want to remove from my string.'

【讨论】：

【解决方案5】：

我试过了，对我来说效果很好：

def remover(text, state):
    
    removable = text.split("\\")[1]
    removable = removable.split(" ")[0]
    removable = "\\" + removable + " "
    text = text.replace(removable, "")
    state = True if "\\" in text else False
    return text, state


text = "hello \\I'm new here \\good luck"
state = True
while state:
    text, state = remover(text, state)
print(text)

【讨论】：