【发布时间】:2020-05-06 06:50:17
【问题描述】:
我正在寻找一个 Python 正则表达式来匹配关键字的所有变体,除非前面有一个大写单词 -> 除非那个大写单词是句子的开头。也排除括号之间的单词。
例如:
keyword = 'public record'
string1 = 'Hello. His public records are available at city hall.' #match public records His is the start of a sentence so we ignore that it is capitalized and match
string2 = 'his records are at Newsom Public Record DataBase' #nomatch
string3 = 'Public records may be available online' #match Public records
string4 = '[public records](http:/....)' #nomatch
到目前为止我已经尝试过:
pattern = f'(?<!\[)(?i)\\w*{keyword}\\w*' #Doesn't take into account preceding capitalized words
pattern = f'(?<![A-Z][\w-]\s)(?<!\[)(?i)\\w*{keyword}\\w*' #Doesn't work for cap words > 2 chara
【问题讨论】:
-
His public records前面是大写字母his,所以不应该匹配 -
但 His 是一个句子的开头,所以我希望它匹配。
-
有了这种逻辑 imo 最好做一些混合解决方案:拆分成单词 + 正则表达式。是否可以接受,还是必须使用单个正则表达式来完成?
-
我会试试的。找到简单模式的正匹配,拆分标记并测试周围的单词。仍然很高兴知道正则表达式是否可以用于个人致富!