在 Python 中删除数字（正则表达式）答案

【问题标题】：Delete digits in Python (Regex)在 Python 中删除数字（正则表达式）
【发布时间】：2010-10-23 11:02:10
【问题描述】：

我正在尝试从字符串中删除所有数字。但是，下一个代码也会删除任何单词中包含的数字。显然，我不想那样。我一直在尝试许多正则表达式，但没有成功。

谢谢！

s = "This must not be deleted, but the number at the end yes 134411"
s = re.sub("\d+", "", s)
print s

结果：

这个一定不能b删除，但是最后的数字是的

【问题讨论】：

我的答案是唯一一个普遍适用的答案，感谢投票，您可以看到我在答案中提供的链接以何种方式失败（这是唯一一个有图片的答案）

标签： python regex digits

【解决方案1】：

我有一个灯泡时刻，我试过了，它奏效了：

sol = re.sub(r'[~^0-9]', '', 'aas30dsa20')

输出：

aasdsa

【讨论】：

【解决方案2】：

只匹配字符串中的纯整数：

\b(?<![0-9-])(\d+)(?![0-9-])\b

它这样做是正确的，只匹配百万之后的所有内容：

max-3 cvd-19 agent-007 8-zoo 2ab c3d ef4 55g h66i jk77 
8m9n o0p2     million     0 22 333  4444

此页面上的所有其他 8 个正则表达式答案都因该输入而以各种方式失败。

第一个 0-9 ... [0-9-] ... 末尾的破折号保留 -007，第二组中的破折号保留 8-。

如果您愿意，也可以用 \d 代替 0-9

at regex101

可以简化吗？

【讨论】：

\d+ 周围的括号可以去掉，但可以用来捕获纯数字

【解决方案3】：

你可以试试这个

s = "This must not b3 delet3d, but the number at the end yes 134411"
re.sub("(\s\d+)","",s)

结果：

'This must not b3 delet3d, but the number at the end yes'

同样的规则也适用于

s = "This must not b3 delet3d, 4566 but the number at the end yes 134411" 
re.sub("(\s\d+)","",s)

结果：

'This must not b3 delet3d, but the number at the end yes'

【讨论】：

【解决方案4】：

>>>s = "This must not b3 delet3d, but the number at the end yes 134411"
>>>s = re.sub(r"\d*$", "", s)
>>>s

"这一定不是b3 delete3d，而是末尾的数字yes"

这将删除字符串末尾的数字。

【讨论】：

【解决方案5】：

我不知道你的真实情况是什么样的，但大多数答案看起来他们不会处理负数或小数，

re.sub(r"(\b|\s+\-?|^\-?)(\d+|\d*\.\d+)\b","")

上面还应该处理类似的事情，

"这一定不是b3 delete3d，而是末尾的数字yes -134.411"

但这仍然不完整 - 您可能需要更完整地定义您可以在需要解析的文件中找到的内容。

编辑：还值得注意的是，'\b' 会根据您使用的语言环境/字符集而变化，因此您需要小心谨慎。

【讨论】：

【解决方案6】：

非正则表达式解决方案：

>>> s = "This must not b3 delet3d, but the number at the end yes 134411"
>>> " ".join([x for x in s.split(" ") if not x.isdigit()])
'This must not b3 delet3d, but the number at the end yes'

由" "分割，并通过str().isdigit()检查块是否为数字，然后将它们重新连接在一起。更冗长（不使用列表理解）：

words = s.split(" ")
non_digits = []
for word in words:
    if not word.isdigit():
        non_digits.append(word)

" ".join(non_digits)

【讨论】：

【解决方案7】：

使用\s 不是很好，因为它不处理制表符等。更好的解决方案的第一步是：

re.sub(r"\b\d+\b", "", s)

请注意，该模式是一个原始字符串，因为\b 通常是字符串的退格转义，我们需要特殊的单词边界正则表达式转义。一个稍微花哨的版本是：

re.sub(r"$\d+\W+|\b\d+\b|\W+\d+$", "", s)

当字符串的开头/结尾有数字时，它会尝试删除前导/尾随空格。我说“尝试”是因为如果最后有多个数字，那么您仍然有一些空格。

【讨论】：

【解决方案8】：

在 \d+ 之前添加一个空格。

>>> s = "This must not b3 delet3d, but the number at the end yes 134411"
>>> s = re.sub(" \d+", " ", s)
>>> s
'This must not b3 delet3d, but the number at the end yes '

编辑：在查看了 cmets 之后，我决定形成一个更完整的答案。我认为这是所有情况的原因。

s = re.sub("^\d+\s|\s\d+\s|\s\d+$", " ", s)

【讨论】：

诸如“3at”之类的字符串呢？
这里还有 2 个单元测试用例：'123 应该被删除。'和“你已经 0wn3d”
另一个 re.sub("^\d+\s|\s\d+\s|\s\d+$", " ", "1 2 3 我失败了")

【解决方案9】：

在行首也处理数字字符串：

s = re.sub(r"(^|\W)\d+", "", s)

【讨论】：

【解决方案10】：

试试这个：

"\b\d+\b"

这将只匹配那些不属于另一个单词的数字。

【讨论】：

这不会删除第一个或最后一个数字，s = s = "1234 这不能 b3 delet3d, 123 但最后的数字是 134411"
我刚刚用你的字符串测试了它，我得到了预期的结果。 \b 匹配字符串的开头、结尾或任何非单词字符 ([A-Za-z0-9_])。不过我在 IronPython 中测试过，不知道 Python 对单词边界的处理是否有问题
这个我没试过，但是你能不能这样做：[^\b]\d+[$\b]
sharth：本质上是一样的。 \b 将匹配字符串的开头或结尾。这是一个匹配单词和非单词“之间”的“空模式”。所以 re.sub(r'\b', '!', 'one two') 给出 "!one! !two!"

【解决方案11】：

如果您的号码始终位于字符串的末尾，请尝试： re.sub("\d+$", "", s)

否则，您可以尝试 re.sub("(\s)\d+(\s)", "\1\2", s)

您可以调整反向引用以仅保留一两个空格（\s 匹配任何白色分隔符）

【讨论】：

\W 可能比 \s 更好。此外，更好的变体是 "\b\d+\b"，只是它对我不起作用！