如何删除字符串中外括号之间的所有文本？答案

【问题标题】：How to remove all text between the outer parentheses in a string?如何删除字符串中外括号之间的所有文本？
【发布时间】：2016-09-28 10:38:30
【问题描述】：

当我有这样的字符串时：

s1 = 'stuff(remove_me)'

我可以轻松删除使用中的括号和文本

# returns 'stuff'
res1 = re.sub(r'\([^)]*\)', '', s1)

正如here 解释的那样。

但我有时会遇到这样的嵌套表达式：

s2 = 'stuff(remove(me))'

当我从上面运行命令时，我最终得到了

'stuff)'

我也试过了：

re.sub('\(.*?\)', '', s2)

这给了我相同的输出。

如何删除外括号内的所有内容 - 包括括号本身 - 以便最终得到 'stuff'（应该适用于任意复杂的表达式）？

【问题讨论】：

检查Remove text between () and [] in python。
@WiktorStribiżew：谢谢！但那是关于不嵌套的表达式。而且我很确定存在不需要大量 if-else 子句和 for 循环的东西。
这个answer 包含您需要的正则表达式，但您需要一个 PyPi 正则表达式模块。

标签： python regex parentheses

【解决方案1】：

注意：\(.*\) 匹配左起第一个 (，然后匹配任何 0+ 个字符（如果未启用 DOTALL 修饰符，则换行除外）直到最后 )，并且不考虑正确嵌套的括号。

要在 Python 中使用正则表达式正确删除嵌套括号，您可以使用简单的 \([^()]*\)（匹配 (，然后是除 ( 和 ) 之外的 0+ 个字符然后在 while 块中使用)) re.subn:

def remove_text_between_parens(text):
    n = 1  # run at least once
    while n:
        text, n = re.subn(r'\([^()]*\)', '', text)  # remove non-nested/flat balanced parts
    return text

基本上：删除内部没有( 和) 的(...)，直到找不到匹配项。用法：

print(remove_text_between_parens('stuff (inside (nested) brackets) (and (some(are)) here) here'))
# => stuff   here

也可以使用非正则表达式：

def removeNestedParentheses(s):
    ret = ''
    skip = 0
    for i in s:
        if i == '(':
            skip += 1
        elif i == ')'and skip > 0:
            skip -= 1
        elif skip == 0:
            ret += i
    return ret

x = removeNestedParentheses('stuff (inside (nested) brackets) (and (some(are)) here) here')
print(x)              
# => 'stuff   here'

见another Python demo

【讨论】：

万一需要使用re 方法来删除嵌套的方括号，请使用r'\[[^][]*]' 模式。对于花括号，请使用 r'{[^{}]*}'

【解决方案2】：

如前所述，您需要recursive regex 来匹配任意级别的嵌套，但如果您知道最多只能有一层嵌套，请尝试使用此模式：

\((?:[^)(]|\([^)(]*\))*\)

[^)(] 匹配一个不是括号的字符 (negated class)。
|\([^)(]*\) 或匹配另一个 ( ) 对，其中包含任意数量的 non )(。
(?:...)* 这一切都在( )

Here is a demo at regex101

在交替使用 [^)(] 而不使用 + 量词之前，如果不平衡会更快地失败。
您需要添加更多可能发生的嵌套级别。例如最多 2 个级别：

\((?:[^)(]|\((?:[^)(]|\([^)(]*\))*\))*\)

Another demo at regex101

【讨论】：

非常好，感谢您的详细解释（赞成）！
我刚刚遇到了类似的情况，并为这个解决方案搜索了很多。感谢您分享这个想法并提供很好的解释。

【解决方案3】：

re 匹配是急切的，因此它们会尝试匹配尽可能多的文本，对于您提到的简单测试用例，只需运行正则表达式即可：

>>> re.sub(r'\(.*\)', '', 'stuff(remove(me))')
'stuff'

【讨论】：

@Cleb 请注意，这不会检查大括号是否匹配。例如。在foo(bar)baz(spam)e)ggs 中，它只会留下fooggs。
@ivan_pozdeev：感谢您的警告，很高兴知道！在我的示例中，它们应该匹配，但我还是会添加一个检查。

【解决方案4】：

如果您确定括号最初是平衡的，只需使用 greedy 版本：

re.sub(r'\(.*\)', '', s2)

【讨论】：

【解决方案5】：

https://regex101.com/r/kQ2jS3/1

'(\(.*\))'

这会捕获furthest 括号以及括号之间的所有内容。

您的旧正则表达式捕获第一个括号，以及 next 括号之间的所有内容。

【讨论】：

与其他两个答案相同，但无论如何感谢（赞成）... :)

【解决方案6】：

我在这里找到了解决方案：

http://rachbelaid.com/recursive-regular-experession/

上面写着：

>>> import regex
>>> regex.search(r"^(\((?1)*\))(?1)*$", "()()") is not None
True
>>> regex.search(r"^(\((?1)*\))(?1)*$", "(((()))())") is not None
True
>>> regex.search(r"^(\((?1)*\))(?1)*$", "()(") is not None
False
>>> regex.search(r"^(\((?1)*\))(?1)*$", "(((())())") is not None
False

【讨论】：