【发布时间】:2018-03-28 01:57:51
【问题描述】:
给定字符串:
I'll be going home I've the 'v ' isn't want I want to split but I want to catch tokens like 'v and 'w ' .
目标是捕捉:
'v
'v
'w
但请避免使用 've 和 'll 和 't。
我试图用(?i)\'(?:ve|ll|t)\b 来捕捉've 和'll 和't,例如
>>> import re
>>> x = "I'll be going home I've the 'v ' isn't want I want to split but I want to catch tokens like 'v and 'w ' ."
>>> pattern = r"(?i)\'(?:ve|ll|t)\b"
>>> re.findall(pattern, x)
["'ll", "'ve", "'t"]
但是我也尝试过像这样(?i)\'[^(?:ve|ll|t)]\b 否定(?i)\'(?:ve|ll|t)\b 中的非捕获组,但它没有捕获'v 和'w,这是预期的目标。
如何捕获单引号后面但不是来自预定义子字符串列表的子字符串,即'll、've 和't?
这个我也试过了,还是不行:
pattern = "(?i)\'(?:[^ve|ll|t|\s])\b"
但[^...] 只识别单个字符而不识别子字符串。
【问题讨论】:
标签: python regex quotation-marks capturing-group