【问题标题】:How to split a text into sentences while keeping the punctuations? [closed]如何在保留标点符号的同时将文本拆分为句子? [关闭]
【发布时间】:2020-12-01 06:49:45
【问题描述】:

伙计们。我是 Python 3.8 的新手。我需要在保留标点符号的同时将文本拆分成句子。您还可以遇到无数的标点符号。我还没有学过正则表达式,所以有没有办法使用像 find() 和切片字符串这样的简单代码来做到这一点?我尝试了 find() 和 slices,但它不是一刀切的代码。期待更好的使用 find() 和 slice 的方法。谢谢。

【问题讨论】:

  • 这能回答你的问题吗? How can I split a text into sentences?
  • 不,但是谢谢。:) 我已经交了作业。等待我们老师的反馈。学习正则表达式后,这对我来说可能会更容易。非常感谢!

标签: python-3.x string split


【解决方案1】:

您可以使用split('. ') 将字符串拆分为多个列表元素,这些元素由. 拆分

并且为了保持相同的标点符号将. 附加到所有列表元素。

>>> text = "guys. I am a novice at Python 3.8. I need to split a text into sentences while keeping the punctuations. You can also encounter countinuous punctuations. I haven't learnt regular expression so is there a way using simple codes like find() and slicing strings to do this? I tried find() and slices but it is not a one-size-fits-all code. Looking forward to better ways of using find() and slice. Thanks."
>>> sentences = [f'{i}. ' for i in text.split('. ')]
>>> sentences
['guys. ', 'I am a novice at Python 3.8. ', 'I need to split a text into sentences while keeping the punctuations. ', 'You can also encounter countinuous punctuations. ', "I haven't learnt regular expression so is there a way using simple codes like find() and slicing strings to do this? I tried find() and slices but it is not a one-size-fits-all code. ", 'Looking forward to better ways of using find() and slice. ', 'Thanks.. ']

【讨论】:

  • 是的,是的,这行得通。它不适用于大块文本。 +1
  • :) 非常感谢!我应该早点刷新这个页面。但是如果 puncts = set(';.!...?') ,句尾的这种标点符号呢?
【解决方案2】:

如果您有大量文本,您可能希望使用generator,这样您就不会复制很多次。

例如:

import re

paragraph = """
This is a sentence. This may be another one. I am not sure.
"""

sentence_regex = r'[^.]+.'
# match one or more not periods, followed by a period


def find_sentences(text):
    for match in re.finditer(sentence_regex, text):
        yield match.group(0).strip()


for sentence in find_sentences(paragraph):
    print(sentence)

执行:

[ttucker@zim stackoverflow]$ python sentence.py 
This is a sentence.
This may be another one.
I am not sure.

【讨论】:

  • 是的,这就是问题所在。下节课我们要学习RE。 T T 但非常感谢!你让我开心。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2017-02-11
  • 2013-04-14
  • 1970-01-01
  • 2013-11-24
  • 2012-07-30
相关资源
最近更新 更多