【问题标题】:Regex remove the first charcter from the group - Python正则表达式从组中删除第一个字符 - Python
【发布时间】:2020-10-30 08:09:20
【问题描述】:

我的字符串很长,我想将字符串拆分为句子。 为此,我需要使用正则表达式。

这是我的正则表达式:

([A-Z1-9]{1}.+?\.)\s*[A-Z1-9]{1}

(每个句子都以大写字母开头,以.结尾,然后我检查点后面的第一个字母是大写字母还是数字)。

当我运行以下 python 代码时:

    txt = """ DETROIT—Alter Road runs northwest along this city’s border. To the east is Grosse 
           Pointe Park, an upscale suburb dotted with grand old mansions built in the auto industry’s 
           heyday. To the west is the city of Detroit, lined with abandoned houses and empty lots.
           On the east side of the street, getting a mortgage to buy a home is a breeze. On the west 
           side, it is hardly worth trying.

           Detroit is making a comeback after years of decline that led to a bankruptcy filing in 
          2013. But large swaths of the city are left behind, starved of the housing credit needed to 
          revive them. No purchase mortgages were made last year in almost a third of Detroit’s 
          census tracts, and fewer than five each in another third, according to data from 
          LendingPatterns.com, a mortgage-data analysis tool. """

     r1 = re.findall(r"([A-Z1-9]{1}.+?\.)\s*[A-Z|1-9]", txt, flags=re.IGNORECASE | re.MULTILINE)
     print(r1)

我得到了:

每个句子的第一个字母都下来了,我不明白为什么?

感谢您的帮助!

【问题讨论】:

  • 我觉得你需要re.findall(r'[A-Z0-9].*?\.(?=\s*(?:[A-Z0-9]|$))', text, re.DOTALL)

标签: python regex


【解决方案1】:

您可以改为使用前瞻来进行检查并省略捕获组,因为您只需要匹配。

您可以省略{1},因为文本末尾有一个空格,您可以断言\s*[A-Z1-9]\s*$ 也匹配最后一行。

[A-Z1-9].+?\.(?=\s*(?:[A-Z1-9]|$))

Regex demo

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2022-06-28
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2012-03-16
    • 1970-01-01
    • 2022-01-22
    • 1970-01-01
    相关资源
    最近更新 更多