【发布时间】:2020-10-30 08:09:20
【问题描述】:
我的字符串很长,我想将字符串拆分为句子。 为此,我需要使用正则表达式。
这是我的正则表达式:
([A-Z1-9]{1}.+?\.)\s*[A-Z1-9]{1}
(每个句子都以大写字母开头,以.结尾,然后我检查点后面的第一个字母是大写字母还是数字)。
当我运行以下 python 代码时:
txt = """ DETROIT—Alter Road runs northwest along this city’s border. To the east is Grosse
Pointe Park, an upscale suburb dotted with grand old mansions built in the auto industry’s
heyday. To the west is the city of Detroit, lined with abandoned houses and empty lots.
On the east side of the street, getting a mortgage to buy a home is a breeze. On the west
side, it is hardly worth trying.
Detroit is making a comeback after years of decline that led to a bankruptcy filing in
2013. But large swaths of the city are left behind, starved of the housing credit needed to
revive them. No purchase mortgages were made last year in almost a third of Detroit’s
census tracts, and fewer than five each in another third, according to data from
LendingPatterns.com, a mortgage-data analysis tool. """
r1 = re.findall(r"([A-Z1-9]{1}.+?\.)\s*[A-Z|1-9]", txt, flags=re.IGNORECASE | re.MULTILINE)
print(r1)
我得到了:
每个句子的第一个字母都下来了,我不明白为什么?
感谢您的帮助!
【问题讨论】:
-
我觉得你需要
re.findall(r'[A-Z0-9].*?\.(?=\s*(?:[A-Z0-9]|$))', text, re.DOTALL)