【发布时间】:2015-02-07 14:35:27
【问题描述】:
我想根据正则表达式将字符串分成 2 组。该字符串基本上具有以下结构:
some text (data1 | data2 | data3 | data4)
我使用了一个简单的正则表达式如下:
re.match("^(?P<title>.*)\((?P<data>.*)\)$", s)
只要字符串中没有括号,它就可以正常工作,这会与正则表达式冲突。
但如果其中一组中有括号,则会输出意外结果:
>>> import re
>>> def process_string1(s):
... r = re.match("^(?P<title>.*?)\((?P<data>.*)\)$", s)
... return r.groups()
...
>>> def process_string2(s):
... r = re.match("^(?P<title>.*)\((?P<data>.*)\)$", s)
... return r.groups()
...
>>> s = "this is an example (detail) (data1 | data2 | data3 | data4)"
>>> print process_string1(s)
('this is an example ', 'detail) (data1 | data2 | data3 | data4') # Wrong
>>> print process_string2(s)
('this is an example (detail) ', 'data1 | data2 | data3 | data4') # Good
>>> s = "this is another example (data1 (detail) | data2 | data3 | data4)"
>>> print process_string1(s)
('this is another example ', 'data1 (detail) | data2 | data3 | data4') # Good
>>> print process_string2(s)
('this is another example (data1 ', 'detail) | data2 | data3 | data4') # Wrong
你能帮帮我吗?
【问题讨论】:
-
Python 正则表达式默认是“贪婪”的。他们抓取满足表达式的最长字符串。如果字符串中有其他 '(',第一部分将抓取 '('。也许你应该将 '.*' 更改为 '[^(]*'?