如何获得平衡括号之间的表达式答案

【问题标题】：How to get an expression between balanced parentheses如何获得平衡括号之间的表达式
【发布时间】：2016-11-07 18:47:30
【问题描述】：

假设我得到以下类型的字符串：

"(this is (haha) a string(()and it's sneaky)) ipsom (lorem) bla"

我想提取包含在括号的最顶层中的子字符串。 IE。我要获取字符串："this is (haha) a string(()and it's sneaky)" 和"lorem"。

有没有很好的pythonic方法来做到这一点？正则表达式显然可以胜任这项任务，但也许有办法让 xml 解析器完成这项工作？对于我的应用程序，我可以假设括号格式正确，即不是像 (()(() 这样的东西。

【问题讨论】：

我认为你应该为此定义一个函数。在该函数中，遍历字符串并维护一个标志以检查您是否在括号的最顶层。使用此方法，您可以获得开始和结束的索引，然后您可以提取字符串并连接到最终答案
这会被视为“pythonic”方法吗？我会通过使用 ctr 来解决它，它会在点击“（”时增加，在点击“）”时减少。当它在达到至少 1 个 '(' 后达到 0 时，您可以将初始位置和最终位置之间的子字符串附加到列表中。
嗨 utkarsh13。感谢那。这或多或少是我想到的解决方案，但我想知道是否有更快的内置在 python 中的一些功能，可以在几行易于阅读的行中完成。
@user177955 又快又脏：print re.match(string.replace(")",").").replace("(",".("), string).groups()[0::4]。抱歉，我无法抗拒：这个字符串看起来太像一个正则表达式，我把它变成了一个正则表达式。 :P 话虽如此，您应该真正编写自己的堆栈或遵循 utkarsh 所说的内容。
@SuperSaiyan 当然，对于任何字符串，都存在一个任意复杂的 RE 可以完成这项工作：-P

标签： python string xml-parsing parentheses

【解决方案1】：

这是堆栈的标准用例：您按字符读取字符串，并且每当遇到左括号时，将符号压入堆栈；如果遇到右括号，则将符号从堆栈中弹出。

由于您只有一种类型的括号，因此您实际上不需要堆栈；相反，只需记住有多少个左括号就足够了。

此外，为了提取文本，我们还记得第一级括号打开时部分的开始位置，并在遇到匹配的右括号时收集结果字符串。

这可能看起来像这样：

string = "(this is (haha) a string(()and it's sneaky)) ipsom (lorem) bla"

stack = 0
startIndex = None
results = []

for i, c in enumerate(string):
    if c == '(':
        if stack == 0:
            startIndex = i + 1 # string to extract starts one index later

        # push to stack
        stack += 1
    elif c == ')':
        # pop stack
        stack -= 1

        if stack == 0:
            results.append(string[startIndex:i])

print(results)
# ["this is (haha) a string(()and it's sneaky)", 'lorem']

【讨论】：

@poke。感谢您编写 utkarsh13 和 Vaibhav Bajaj 的 cmets。我有一个小问题for i,c in enumerate(string) 是如何工作的？
@user177955 迭代enumerate(x) 将在每次迭代中为您提供一个二元组，除了可迭代的值之外，还有索引。因此，我们不是从字符串中获取每个字符，而是将字符与其在字符串中的索引配对。

【解决方案2】：

这不是很“pythonic”......但是

def find_strings_inside(what_open,what_close,s):
    stack = []
    msg = []
    for c in s:
        s1=""
        if c == what_open:
           stack.append(c)
           if len(stack) == 1:
               continue
        elif c == what_close and stack:
           stack.pop()
           if not stack:
              yield "".join(msg)
              msg[:] = []
        if stack:
            msg.append(c)

x= list(find_strings_inside("(",")","(this is (haha) a string(()and it's sneaky)) ipsom (lorem) bla"))

print x

【讨论】：

【解决方案3】：

你确定正则表达式不够好？

>>> x=re.compile(r'\((?:(?:\(.*?\))|(?:[^\(\)]*?))\)')
>>> x.findall("(this is (haha) a string(()and it's sneaky)) ipsom (lorem) bla")
["(this is (haha) a string(()and it's sneaky)", '(lorem)']
>>> x.findall("((((this is (haha) a string((a(s)d)and ((it's sneaky))))))) ipsom (lorem) bla")
["((((this is (haha) a string((a(s)d)and ((it's sneaky))", '(lorem)']

【讨论】：

我没有投反对票。但是正则表达式并不是用于需要堆栈的地方的工具。我也应该为在 cmets 中提出相同的建议而感到羞耻（但这只是为了好玩；））
afaik 有一些内置的正则表达式包（我认为字面意思是import regexp）已经扩展了对需要堆栈的东西的支持....afaik ...我仍然不赞成这个解决方案的正则表达式恕我直言)
@JoranBeasley 这不是“你应该盲目地使用它，因为它是正则表达式而且它很好”，更多地证明了“正则表达式显然不能胜任这项任务”的说法完全错误，因为他们可以做到。
我可以给你一个字符串，我很确定我可以打破这个正则表达式......向前看的东西让人很难猜到（我当然没有投反对票，如果正则表达式有效，那就太好了：P）
考虑 "((((this is (haha) a string((a(s)d)and ((it's sneaky))))))) ipsom (lorem) bla" ... 除非你 100% 确定最大嵌套深度 ... 即使这样，正则表达式也会变得很丑

【解决方案4】：

这或多或少重复了已经说过的内容，但可能更容易阅读：

def extract(string):
    flag = 0
    result, accum = [], []
    for c in string:
        if c == ')':
            flag -= 1
        if flag:
            accum.append(c)
        if c == '(':
            flag += 1
        if not flag and accum:
            result.append(''.join(accum))
            accum = []
    return result

>> print extract(test)
["this is (haha) a string(()and it's sneaky)", 'lorem']

【讨论】：